November 24th, 2010
November 21st, 2010
Papers are solicited for the 2011 Symposium on Application Accelerators in High-Performance Computing. Presentations from technology developers and the academic user community are invited on the following topics:
- novel accelerator processors, systems, and architectures
- integration of accelerators with high-performance computing systems
- programming models for accelerator-based computing
- languages and compilers for accelerator-based computing
- run-time environments, profiling and debugging tools for accelerator-based computing
- scientific and engineering applications that use application accelerators
In addition to the general session, submissions are invited for the following domain-specific topics:
- Computational chemistry on accelerators (Chair: TBD)
- Lattice QCD (Chair: Steven Gottlieb, Indiana University, Bloomington)
- Weather and climate modeling (Chair: John Michalakes, National Renewable Energy Laboratory)
- Bioinformatics (Chair: TBD)
Submissions are due May 6, 2011, and more information can be found at the symposium website www.saahpc.org.
November 21st, 2010
We are pleased to announce High-Performance Graphics 2011. High Performance Graphics is the leading international forum for performance-oriented graphics systems research including innovative algorithms, efficient implementations, and hardware architecture. The conference brings together researchers, engineers, and architects to discuss the complex interactions of massively parallel hardware, novel programming models, efficient graphics algorithms, and innovative applications. High Performance Graphics was founded in 2009 to synthesize and expand on two important and well-respected conferences in computer graphics:
- Graphics Hardware: an annual conference focusing on graphics hardware, architecture, and systems since 1986; and
- Interactive Ray Tracing: an innovative symposium begun in 2006 focusing on the emerging field of interactive ray tracing and global illumination techniques.
By combining and expanding these two communities, we bring to authors and attendees the best of both fields and a conference covering a broad range of interactive 3D graphics systems and algorithm research.
Sponsored by ACM SIGGRAPH and Eurographics (pending)
The program features three days of paper and industry presentations, with ample time for discussions during breaks, lunches, and the conference banquet.
The conference, which will take place on August 5—7, is co-located with ACM SIGGRAPH 2011 in Vancouver, Canada.
The conference website is located at http://www.highperformancegraphics.org/
We invite original and innovative performance-oriented contributions from all areas of graphics, including hardware architectures, rendering, physics, animation, AI, simulation, and data structures, with topics including (but not limited to): Read the rest of this entry »
November 18th, 2010
The Euler-Lagrange (EL) framework is the most widely-used strategy for solving variational optic flow methods. We present the first approach that solves the EL equations of state-of-the-art methods on sequences with 640×480 pixels in near-realtime on GPUs. This performance is achieved by combining two ideas: (i) We extend the recently proposed Fast Explicit Diffusion (FED) scheme to optic flow, and additionally embed it into a coarse-to-fine strategy. (ii) We parallelise our complete algorithm on a GPU, where a careful optimisation of global memory operations and an efficient use of on-chip memory guarantee a good performance. Applying our approach to the variational ‘Complementary Optic Flow’ method (Zimmer et al. (2009)), we obtain highly accurate flow fields in less than a second. This currently constitutes the fastest method in the top 10 of the widely used Middlebury benchmark.
(Pascal Gwosdek, Henning Zimmer, Sven Grewenig, Andrés Bruhn and Joachim Weickert: “A Highly Efficient GPU Implementation for Variational Optic Flow Based on the Euler-Lagrange Framework”, Proceedings of the ECCV Workshop for Computer Vision with GPUs, Sep 2010.) [Project webpage with PDF, sources and additional information]
November 17th, 2010
The application period for the NVIDIA Graduate Fellowship Program is now open. We are currently accepting applications for the 2011-2012 academic year. The deadline to apply is 11:59PM PST on February 3, 2011.
NVIDIA has long believed that investing in university talent is beneficial to the industry and key to our continued growth and success. The NVIDIA Graduate Fellowship Program provides funding to Ph.D. students who are researching topics that will lead to major advances in the graphics and high-performance computing industries, and are investigating innovative ways of leveraging the power of the GPU. We select students each year who have the talent, aptitude and initiative to work closely with us early in their careers. Recipients not only receive crucial funding for their research, but are able to conduct groundbreaking work with access to NVIDIA products, technology and some of the most talented minds in the field.
For complete details including application instructions, requirements, benefits, and eligibility, visit the NVIDIA Graduate Fellowship website.
November 16th, 2010
The International Journal of Computer Science and Security (IJCSS) is a refereed online journal which is a forum for publication of current research in computer science and computer security technologies. It considers any material dealing primarily with the technological aspects of computer science and computer security. The journal is targeted to be read by academics, scholars, advanced students, practitioners, and those seeking an update on current experience and future prospects in relation to all aspects computer science in general but specific to computer security themes. Subjects covered include: access control, computer security, cryptography, communications and data security, databases, electronic commerce, multimedia, bioinformatics, signal processing and image processing etc. Read the rest of this entry »
November 16th, 2010
The 2011 Spring Simulation Multiconference will feature the 19th High Performance Computing Symposium (HPC 2011), devoted to the impact of high performance computing and communications on computer simulations. Advances in multicore and many-core architectures, networking, high end computers, large data stores, and middleware capabilities are ushering in a new era of high performance parallel and distributed simulations. Along with these new capabilities come new challenges in computing and system modeling. The goal of HPC 2011 is to encourage innovation in high performance computing and communication technologies and to promote synergistic advances in modeling methodologies and simulation. It will promote the exchange of ideas and information between universities, industry, and national laboratories about new developments in system modeling, high performance computing and communication, and scientific computing and simulation.
Topics of interest include:
- high performance/large scale application case studies,
- GPU, multicore, and many-core analysis and applications,
- power aware computing,
- cloud, distributed, and grid computing,
- asynchronous numerical methods and programming,
- hybrid system modeling and simulation,
- visualization and data management,
- problem solving environments,
- tools and environments for coupling parallel codes,
- parallel algorithms and architectures,
- high performance software tools,
- resilience at the simulation level,
- component technologies for high performance computing.
More information can be found on the webpage: http://www.cs.vt.edu/hpc2011/
November 16th, 2010
The goal of this workshop, held in conjunction with ASPLOS XVI (Newport Beach, CA USA, March 5-6 2011) is to provide a forum to discuss new and emerging general-purpose purpose programming environments and platforms, as well as evaluate applications that have been able to harness the horsepower provided by these platforms. This year’s work is particularly interested on new heterogeneous GPU platforms. Papers are being sought on many aspects of GPUs, including (but not limited to):
- GPU applications + GPU compilation
- GPU programming environments + GPU power/efficiency
- GPU architectures + GPU benchmarking/measurements
- Multi-GPU systems + Heterogeneous GPU platforms
Paper Submission: Authors should submit a 8 page paper in ACM double-column style using the directions on the conference website at http://www.ece.neu.edu/GPGPU.
Organizers: John Cavazos (University of Delaware) and David Kaeli (Northeastern University)
November 16th, 2010
Graphics processing units (GPUs) have traditionally been used in molecular modeling solely for visualization of molecular structures and animation of trajectories resulting from molecular dynamics simulations. Modern GPUs have evolved into fully programmable, massively parallel co-processors that can now be exploited to accelerate many scientific computations, typically providing about one order of magnitude speedup over CPU code and in special cases providing speedups of two orders of magnitude. This paper surveys the development of molecular modeling algorithms that leverage GPU computing, the advances already made and remaining issues to be resolved, and the continuing evolution of GPU technology that promises to become even more useful to molecular modeling. Hardware acceleration with commodity GPUs is expected to benefit the overall computational biology community by bringing teraflops performance to desktop workstations and in some cases potentially changing what were formerly batch-mode computational jobs into interactive tasks.
John E. Stone, David J. Hardy, Ivan S. Ufimtsev, and Klaus Schulten: “GPU-Accelerated Molecular Modeling Coming of Age”, Journal of Molecular Graphics and Modelling, Volume 29, Issue 2, September 2010, Pages 116-125. [DOI])
October 27th, 2010
The emergence of Graphics Processing Units (GPUs) as a potential alternative to conventional general-purpose processors has led to significant interest in these architectures by both the academic community and the High Performance Computing (HPC) industry. While GPUs look likely to deliver unparalleled levels of performance, the publication of studies claiming performance improvements in excess of 30,000x are misleading. Significant on-node performance improvements have been demonstrated for code kernels and algorithms amenable to GPU acceleration; studies demonstrating comparable results for full scientific applications requiring multiple-GPU architectures are rare.
In this paper we present an analysis of a port of the NAS LU benchmark to NVIDIA’s Compute Unified Device Architecture (CUDA) – the most stable GPU programming model currently available. Our solution is also extended to multiple nodes and multiple GPU devices.
Runtime performance on several GPUs is presented, ranging from low-end, consumer-grade cards such as the 8400GS to NVIDIA’s flagship Fermi HPC processor found in the recently released C2050. We compare the runtimes of these devices to several processors including those from Intel, AMD and IBM.
In addition to this we utilise a recently developed performance model of LU. With this we predict the runtime performance of LU on large-scale distributed GPU clusters, which are predicted to become commonplace in future high-end HPC architectural solutions.
(S.J. Pennycook, S.D. Harmond, S.A. Jarvis and G.R. Mudalige: “Implementation of the NAS-LU Benchmark”, 1st International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems (PMBS 10), held as part of Supercomputing 2010 (SC’10), New Orleans, LA, USA. [PDF])
In this work, we evaluate performance of a real-world image processing application that uses a cross-correlation algorithm to compare a given image with a reference one. The algorithm processes individual images represented as 2-dimensional matrices of single-precision floating-point values using operations involving dot-products and additions. We implement this algorithm on a NVIDIA Fermi GPU (Tesla 2050) using CUDA, and also manually parallelize it for the Intel Xeon X5680 (Westmere) and IBM Power7 multi-core processors. Pthreads and OpenMP with SSE and VSX vector intrinsics are used for the manually parallelized version on the multi-core CPUs. A number of optimizations were performed for the GPU implementation on the Fermi, including blocking for Fermi’s configurable on-chip memory architecture. Experimental results illustrate that on a single multi-core processor, the manually parallelized versions of the correlation application perform only a small order of factor slower than the CUDA version executing on the Fermi – 1.005s on Power7, 3.49s on Intel X5680, and 465ms on Fermi. On a two-processor Power7 system, performance approaches that of the Fermi (650ms), while the Intel version runs in 1.78s. These results conclusively demonstrate that performance of the GPU memory subsystem is critical to effectively harness its computational capabilities. For the correlation application, a significantly higher amount of effort was put into developing the GPU version when compared to the CPU ones (several days against few hours). Our experience presents compelling evidence that performance comparable to that of GPUs can be achieved with much greater productivity on modern multi-core CPUs
(R. Bordawekar and U. Bondhugula and R. Rao: “Can CPUs Match GPUs on Performance with Productivity?: Experiences with Optimizing a FLOP-intensive Application on CPUs and GPU”, Technical Report, IBM T. J. Watson Research Center, 2010 [PDF])