Rapid evolution of GPUs in performance, architecture, and programmability provides general and scientific computational potential beyond their primary purpose, graphics processing. This work presents an efficient algorithm for solving symmetric and positive definite linear systems using the GPU. Using the decomposition algorithm and other basic building blocks for linear algebra on the GPU, the paper demonstrates a GPU-powered linear program solver based on a Primal-Dual Interior-Point Method. (Cholesky Decomposition and Linear Programming on a GPU, Jin Hyuk Jung, Scholarly Paper Directed by Dianne P. O’Leary, Department of Computer Science, University of Maryland, 2006.)
This paper by Govindaraju et al. describes a high-performance FFT algorithm on GPUs. The algorithm is highly tuned for GPUs using memory optimizations. It further improves performance using pipelining strategies. In practice, it is able to achieve 4x higher computational performance on a $500 NVIDIA GPU than optimized single precision FFT algorithms on high-end CPUs costing $1500. (“Efficient memory model for scientific algorithms on graphics processors”, Naga Govindaraju, Scott Larsen, Jim Gray and Dinesh Manocha, UNC Tech. Report 2006)
Universal employment of modern graphics hardware by the example of the optimization of a speech recognition systemMay 24th, 2006
In this masters thesis by Christian Fenzl (accomplished at the University of Applied Sciences in Darmstadt), an easy to use framework is implemented with additional demos to show the main concepts of gpgpu. Furthermore, a demo implementation is included which calculates scores on feature vectors used in a speech recognition system (about 12 times faster than an equivalent cpu implementation). An application with several demos using the framework including the fully documented source code (English) and the paper itself (German) is available. The framework code is recommended especially for gpgpu beginners to look into the OpenGL and DirectX code which shows how gpgpu programs can be developed.
This paper by Dietz et al. from ICCS 2006 details and microbenchmarks the use of pairs of native precision values to obtain higher accuracy results using DSP, SWAR, and GPU hardware. It also dicusses a way to speculatively use lower precision, recomputing with higher precision only when accuracy constraints are not met.(Floating-Point Computation with Just Enough Accuracy)
To focus and facilitate research on real-time ray tracing, a new forum is being created for this rapidly developing field: the 2006 IEEE Symposium on Interactive Ray Tracing, sponsored by the IEEE Computer Society and the IEEE Visualization and Graphics Technical Committee (pending). The Call For Participation is now online and contributions on Ray Tracing on GPUs are invited.
This workshop, to be held May 23-24 2006 in Chapel Hill, North Carolina, will address some recent developments on new commodity architectures, including GPUs, multi-core CPUs, the Cell processor, PPU and other emerging commodity architectures. Some of the issues to be examined in the workshop include the software challenges that arise in programming these new commodity architectures and their impact on different applications and high-performance computing. The workshop will bring together leading researchers and designers from academia, research labs, industrial organizations and federal agencies. A call for posters is now online. (EDGE Workshop 2006)
GPUTeraSort sorts billion-record wide-key databases using the data and task parallelism on the graphics processing unit (GPU) to perform memory-intensive and compute-intensive tasks while the CPU performs I/O and resource management. It exploits both the high-bandwidth GPU memory interface and the lower-bandwidth CPU main memory interface to achieve higher aggregate memory bandwidth than purely CPU-based algorithms. It also pipelines disk transfers to achieve near-peak I/O performance. GPUTera-Sort is a two-phase task pipeline: (1) read disk, build keys, sort using the GPU, generate runs, write disk, and (2) read, merge, write. We tested the performance of GPUTeraSort on billion-record files using the standard Sort benchmark. In practice, a 3 GHz Pentium IV PC with $265 NVIDIA 7800 GT GPU is significantly faster than optimized CPU-based algorithms on much faster processors, sorting 60GB for a penny; the best reported PennySort price-performance. These results suggest that a GPU co-processor can significantly improve performance on large data processing tasks. (GPUTeraSort: High Performance Graphics Coprocessor Sorting for Large Database Management. Naga K. Govindaraju, Jim Gray, Ritesh Kumar, and Dinesh Manocha. Proceedings of ACM SIGMOD 2006.)
At GDC 2006 in San Jose next week Havok will announce Havok FX, a game physics framework for GPUs. There are two talks about Havok FX:
Havok FX: GPU-accelerated Physics for PC Games
Speaker: Andrew Bond (Havok)
This session introduces Havok’s latest innovation for game physics: Havok FX, which enables real-time processing of thousands of rigid-body objects on current and next generation GPUs. Havok’s general approach to GPU Effects Physics will be covered, as well as tool-chain requirements and trade-offs with game-critical, game-play physics processing on the CPU.
Physics Simulation on NVIDIA GPUs
Speakers: Simon Green, Mark Harris (NVIDIA)
Havok FX leverages state of the art software and hardware technology from NVIDIA to extend the capabilities of NVIDIA GPUs and SLI multi-GPU systems to include physics processing for massive real-time effects. In this presentation NVIDIA and Havok engineers will describe how Havok FX utilizes NVIDIA technology to simulate and render thousands of particles and rigid bodies in games. Live real-time demos will demonstrate the high performance available with current GPUs and provide a look into the future of physics processing on NVIDIA GPUs.
Using the GPU to accelerate ray tracing may seem like a natural choice due to the highly parallel nature of the problem. However, determining the most versatile GPU data structure for scene storage and traversal is a challenge. In this paper, we introduce a new method for quick intersection of triangular meshes on the GPU. The method uses a threaded bounding volume hierarchy built from a geometry image, which can be efficiently traversed and constructed entirely on the GPU. This acceleration scheme is highly competitive with other GPU ray tracing methods, while allowing for both dynamic geometry and an efficient level of detail scheme at no extra cost. (Fast GPU Ray Tracing of Dynamic Meshes using Geometry Images Nathan A. Carr, Jared Hoberock, Keenan Crane, and John C. Hart. To appear in Proceedings of Graphics Interface 2006)
This paper studies jump flooding as an algorithmic paradigm in general-purpose computation with GPUs. As an example application of jump flooding, the paper discusses a constant time algorithm on the GPU to compute an approximation to the Voronoi diagram of a given set of seeds in a 2D grid. The errors due to the differences between the approximation and the actual Voronoi diagram are hardly noticeable to the naked eye in all presented experiments. The same approach can also compute in constant time an approximation to the distance transform of a set of seeds in a 2D grid. In practice, such constant time algorithm is useful to many interactive applications involving, for example, rendering and image processing. Besides the experimental evidence, this paper also confirms quantitatively the effectiveness of jump flooding by analyzing the occurrences of errors. The analysis is a showcase of insights to the jump flooding paradigm, and may be of independent interest to other applications of jump flooding. (Jump Flooding in GPU with Applications to Voronoi Diagram and Distance Transform. Guodong Rong and Tiow-Seng Tan. To appear in 2006 SIGGRAPH Symposium on Interactive 3D Graphics and Games. [I3D 2006] )