In this tutorial, NVIDIA engineers and academic and industrial researchers will present CUDA and discuss its advanced use for science and engineering. The tutorial will demonstrate CUDA with traditional HPC examples including BLAS, FFT, and integration with Fortran and high-level languages (MATLAB, Mathematica, Python) and describe in detail the programming model at the heart of it all. It will then turn to advanced topics including optimizing CUDA programs, CUDA floating point performance and accuracy, and CUDA programming strategies and tips. Finally the tutorial will present detailed case studies in which domain scientists will describe their experience using CUDA to accelerate mature, deployed, real-world science codes. Scientists throughout industry and academia are already using CUDA to achieve dramatic speedups on production and research codes (see http://www.nvidia.com/cuda for a list of codes, academic papers and commercial packages based on CUDA). Presenters include Massimiliano Fatica (NVIDIA), Mark Harris (NVIDIA), Patrick LeGresley (NVIDIA), and Jim Phillips (UIUC). Follow this link to register.
ISC 2008 Tutorial: High Performance Computing with CUDA
June 6th, 20081st Annual UMD GPGPU Programming Contest
May 28th, 2008The University of Maryland are sponsoring a GPGPU programming contest. All entries will be released under version 3 of the GPL at the conclusion of the contest. Contestants are asked to submit code for sparse matrix multiplication. UMD will be evaluating entries on both vector/sparse matrix and sparse matrix/sparse matrix multiplications, using a variety of different inputs. As the contest progresses, UMD will update the LeaderBoard regularly, so contestants will have some idea of where they stand. Contestants are welcome to make as many entries as they want, so submit early and then tweak your designs. Entries can be written in either GLSL or CUDA. Prizes include NVIDIA Quadro FX 5600 GPUs, sponsored by NVIDIA. (http://scriptroute.cs.umd.edu/gpucompete/)
A Fast Similarity Join Algorithm Using Graphics Processing Units
May 25th, 2008This paper by Lieberman et al. at the University of Maryland describes an application of GPU processing to the similarity join, a common operation in spatial databases. A similarity join takes two sets of points A, B and returns pairs p ∈ A, q ∈ B where the distance D(p,q) ≤ ε. The similarity join is a common spatial database operation with many applications. An algorithm named LSS is presented that executes on a GPU, taking advantage of the GPU’s parallelism and large data throughput. To achieve peak efficiency, LSS relies only on simple primitive operations that execute quickly on the GPU, such as the sorting and searching of arrays. It recasts the similarity join as a sort-and-search problem by mapping its input datasets onto a set of space-filling curves, generated by a parallel sort routine on the GPU. It then searches small intervals of these curves that are guaranteed to contain all pairs of the correct result. LSS offers a balance between time and work efficiencies and is shown to perform well when compared against existing prominent high-dimensional similarity join methods. (M. D. Lieberman, J. Sankaranarayanan, and H. Samet. A fast similarity join algorithm using graphics processing units. In Proceedings of the 24th IEEE International Conference on Data Engineering, pages 1111-1120, Cancun, Mexico, April 2008.)
Multiscale and local search methods for real time region tracking with particle filters: local search driven by adaptive scale estimation on GPUs
May 25th, 2008This paper by Cabido et al. presents a real-time object tracking algorithm, based on the hybridization of particle filtering (PF) and a multi-scale local search (MSLS) algorithm, for both CPU and GPU architectures. The developed system provides successful results in precise tracking of single and multiple targets in monocular video, operating in real-time at 70 frames per second for 640 × 480 video resolutions on the GPU, up to 1100% faster than the CPU version of the algorithm. (Multiscale and local search methods for real time region tracking with particle filters: local search driven by adaptive scale estimation on GPUs. Raul Cabido, Antonio S. Montemayor, Juan Jose Pantrigo, and Bryson R. Payne. Machine Vision and Applications, Springer, 2008.)
GPU acceleration of cutoff pair potentials for molecular modeling applications
May 25th, 2008The advent of systems biology requires the simulation of ever-larger biomolecular systems, demanding a commensurate growth in computational power. This paper examines the use of the NVIDIA Tesla C870 graphics card programmed through the CUDA toolkit to accelerate the calculation of cutoff pair potentials, one of the most prevalent computations required by many different molecular modeling applications. The paper presents algorithms to calculate electrostatic potential maps for cutoff pair potentials. Whereas a straightforward approach for decomposing atom data leads to low computational efficiency, a new strategy enables fine-grained spatial decomposition of atom data that maps efficiently to the C870′s memory system while increasing work efficiency of atom data traversal by a factor of 5. The memory addressing flexibility exposed through CUDA’s SPMD programming model is crucial in enabling this new strategy. An implementation of the new algorithm provides a greater than threefold performance improvement over our previously published implementation and runs 12 to 20 times faster than optimized CPU-only code. The lessons learned are generally applicable to algorithms accelerated by uniform grid spatial decomposition. (C. I. Rodrigues, D. J. Hardy, J. E. Stone, K. Schulten, W. W. Hwu., GPU acceleration of cutoff pair potentials for molecular modeling applications. Proceedings of the 2008 Conference On Computing Frontiers, pp.273-282, 2008.) (http://www.ks.uiuc.edu/Research/gpu/)
GPU Computing
May 25th, 2008Abstract: “The graphics processing unit (GPU) has become an integral part of today’s mainstream computing systems. Over the past six years, there has been a marked increase in the performance and capabilities of GPUs. The modern GPU is not only a powerful graphics engine but also a highly parallel programmable processor featuring peak arithmetic and memory andwidth that substantially outpaces its CPU counterpart. The GPU’s rapid increase in both programmability and capability has spawned a research community that has successfully mapped a broad range of computationally demanding, complex problems to the GPU. This effort in general-purpose computing on the GPU, also known as GPU computing, has positioned the GPU as a compelling alternative to traditional microprocessors in high-performance computer systems of the future. We describe the background, hardware, and programming model for GPU computing, summarize the state of the art in tools and techniques, and present four GPU computing successes in game physics and computational biophysics that deliver order-of-magnitude performance gains over optimized CPU applications. (J. D. Owens, M. Houston, D. Luebke, S. Green, J. E. Stone, J. C. Phillips, “GPU Computing”, Proceedings of the IEEE, vol.96, no.5, pp.879-899, May 2008)
CIGPU 5 June 2008 Hong Kong additional technical discussion
May 25th, 2008In addition to the papers already announced, Dr. Simon Harding (Memorial University, Newfoundland) and Dr. Tien-Tsin Wong (The Chinese University of Hong Kong) will lead a discussion on the practicalities of running evolution on modern graphics cards. They will contrast the current leading GPGPU tools considering ease of use, and support for debugging and performance monitoring. CIGPU will close with a short session considering the future of computational intelligence on GPUs.
Graph Layout on the GPU
May 25th, 2008A graph is an ordered pair G=(V,E) where V is a set of nodes and E is a set of edges connecting nodes. Graph drawing addresses the problem of creating geometric representations of graphs. Unlike matrices or images, graphs are unstructured and hence graph layout may not seem to be suitable for acceleration on the GPU. These papers present two GPU-accelerated graph drawing algorithms which are able to quickly compute aesthetic layouts of large graphs. One is for the layout of a single graph and the other is for computing stable layouts of a sequence of graphs. Speedups of 5.5x to 17x relative to a CPU implementation are demonstrated. (Yaniv Frishman and Ayellet Tal, Multi-Level Graph Layout on the GPU, IEEE Transactions on Visualization and Computer Graphics (Proceedings Information Visualization 2007), 13(6):1310-1317, 2007)
(Yaniv Frishman and Ayellet Tal, Online Dynamic Graph Drawing, accepted to IEEE Transactions on Visualization and Computer Graphics)
gDEBugger V4.1 Adds Geometry Shaders Support and new ATI Performance Metrics Integration
May 25th, 2008The new gDEBugger V4.1 adds Geometry Shader Support and enables developers to view allocated geometry shader objects, shader source code and properties. It also allows the developer to Edit and Continue shaders on the fly. Support for the new ATI (AMD) driver performance metrics infrastructure has been added. This integration enables users to view ATI performance metrics such as hardware utilization, vertex wait for pixel, pixel wait for vertex, overdraw and more. These performance metrics together with gDEBugger’s Performance Analysis Toolbar provide a powerful solution for locating graphics system performance bottlenecks. gDEBugger, an OpenGL and OpenGL ES debugger and profiler, traces application activity on top of the OpenGL API, letting programmers see what is happening within the graphics system implementation to find bugs and optimize OpenGL application performance. gDEBugger runs on Microsoft Windows and Linux operating systems. (http://www.gremedy.com)
PRACE award presented to young scientistat ISC’08 for GPGPU work
May 20th, 2008From this article: “PRACE, Partnership for Advanced Computing in Europe, awarded a prize for the best scientific paper submitted to ISC’08 by a European student or young scientist on petascaling. The authors of the award winning paper are Stefan Turek, Dominik Göddeke, Christian Becker, Sven H.M. Buijssen and Hilmar Wobker from the Institute of Applied Mathematics, Dortmund University of Technology, Germany. Their work, UCHPC : UnConventional High Performance Computing for Finite Element Simulations, was selected by the ISC’08 Award Committee, headed by Michael Resch, High Performance Computing Center Stuttgart. Achim Bachem, Chairman of the Board Forschungszentrum Jülich and PRACE coordinator presented the PRACE Award at the ISC’08 opening ceremony in Dresden on Wednesday, 18 June. Dominik Göddeke, Ph.D. student in the team of Professor Stefan Turek will receive a sponsorship for the participation in a conference relevant to Petascale computing.” Dominik has been an active GPGPU researcher for several years, and is one of the most active and helpful contributors to the GPGPU.org forums. (PRACE award presented to young scientist at ISC’08)