Abstracts, citations and links to author homepages of eight papers on GPGPU presented at the ICCS conference, Reading, UK, May 2006, are available. Topics include genome sequencing, GPGPU languages, database operations, computational fluid dynamics, computer vision, computational geometry and neural networks. (http://www.mathematik.uni-dortmund.de/~goeddeke/iccs/papers.html)
Rapid evolution of GPUs in performance, architecture, and programmability provides general and scientific computational potential beyond their primary purpose, graphics processing. This work presents an efficient algorithm for solving symmetric and positive definite linear systems using the GPU. Using the decomposition algorithm and other basic building blocks for linear algebra on the GPU, the paper demonstrates a GPU-powered linear program solver based on a Primal-Dual Interior-Point Method. (Cholesky Decomposition and Linear Programming on a GPU, Jin Hyuk Jung, Scholarly Paper Directed by Dianne P. O’Leary, Department of Computer Science, University of Maryland, 2006.)
This paper by Govindaraju et al. describes a high-performance FFT algorithm on GPUs. The algorithm is highly tuned for GPUs using memory optimizations. It further improves performance using pipelining strategies. In practice, it is able to achieve 4x higher computational performance on a $500 NVIDIA GPU than optimized single precision FFT algorithms on high-end CPUs costing $1500. (“Efficient memory model for scientific algorithms on graphics processors”, Naga Govindaraju, Scott Larsen, Jim Gray and Dinesh Manocha, UNC Tech. Report 2006)
Universal employment of modern graphics hardware by the example of the optimization of a speech recognition systemMay 24th, 2006
In this masters thesis by Christian Fenzl (accomplished at the University of Applied Sciences in Darmstadt), an easy to use framework is implemented with additional demos to show the main concepts of gpgpu. Furthermore, a demo implementation is included which calculates scores on feature vectors used in a speech recognition system (about 12 times faster than an equivalent cpu implementation). An application with several demos using the framework including the fully documented source code (English) and the paper itself (German) is available. The framework code is recommended especially for gpgpu beginners to look into the OpenGL and DirectX code which shows how gpgpu programs can be developed.
This paper by Dietz et al. from ICCS 2006 details and microbenchmarks the use of pairs of native precision values to obtain higher accuracy results using DSP, SWAR, and GPU hardware. It also dicusses a way to speculatively use lower precision, recomputing with higher precision only when accuracy constraints are not met.(Floating-Point Computation with Just Enough Accuracy)
To focus and facilitate research on real-time ray tracing, a new forum is being created for this rapidly developing field: the 2006 IEEE Symposium on Interactive Ray Tracing, sponsored by the IEEE Computer Society and the IEEE Visualization and Graphics Technical Committee (pending). The Call For Participation is now online and contributions on Ray Tracing on GPUs are invited.
This workshop, to be held May 23-24 2006 in Chapel Hill, North Carolina, will address some recent developments on new commodity architectures, including GPUs, multi-core CPUs, the Cell processor, PPU and other emerging commodity architectures. Some of the issues to be examined in the workshop include the software challenges that arise in programming these new commodity architectures and their impact on different applications and high-performance computing. The workshop will bring together leading researchers and designers from academia, research labs, industrial organizations and federal agencies. A call for posters is now online. (EDGE Workshop 2006)
GPUTeraSort sorts billion-record wide-key databases using the data and task parallelism on the graphics processing unit (GPU) to perform memory-intensive and compute-intensive tasks while the CPU performs I/O and resource management. It exploits both the high-bandwidth GPU memory interface and the lower-bandwidth CPU main memory interface to achieve higher aggregate memory bandwidth than purely CPU-based algorithms. It also pipelines disk transfers to achieve near-peak I/O performance. GPUTera-Sort is a two-phase task pipeline: (1) read disk, build keys, sort using the GPU, generate runs, write disk, and (2) read, merge, write. We tested the performance of GPUTeraSort on billion-record files using the standard Sort benchmark. In practice, a 3 GHz Pentium IV PC with $265 NVIDIA 7800 GT GPU is significantly faster than optimized CPU-based algorithms on much faster processors, sorting 60GB for a penny; the best reported PennySort price-performance. These results suggest that a GPU co-processor can significantly improve performance on large data processing tasks. (GPUTeraSort: High Performance Graphics Coprocessor Sorting for Large Database Management. Naga K. Govindaraju, Jim Gray, Ritesh Kumar, and Dinesh Manocha. Proceedings of ACM SIGMOD 2006.)
At GDC 2006 in San Jose next week Havok will announce Havok FX, a game physics framework for GPUs. There are two talks about Havok FX:
Havok FX: GPU-accelerated Physics for PC Games
Speaker: Andrew Bond (Havok)
This session introduces Havok’s latest innovation for game physics: Havok FX, which enables real-time processing of thousands of rigid-body objects on current and next generation GPUs. Havok’s general approach to GPU Effects Physics will be covered, as well as tool-chain requirements and trade-offs with game-critical, game-play physics processing on the CPU.
Physics Simulation on NVIDIA GPUs
Speakers: Simon Green, Mark Harris (NVIDIA)
Havok FX leverages state of the art software and hardware technology from NVIDIA to extend the capabilities of NVIDIA GPUs and SLI multi-GPU systems to include physics processing for massive real-time effects. In this presentation NVIDIA and Havok engineers will describe how Havok FX utilizes NVIDIA technology to simulate and render thousands of particles and rigid bodies in games. Live real-time demos will demonstrate the high performance available with current GPUs and provide a look into the future of physics processing on NVIDIA GPUs.
Using the GPU to accelerate ray tracing may seem like a natural choice due to the highly parallel nature of the problem. However, determining the most versatile GPU data structure for scene storage and traversal is a challenge. In this paper, we introduce a new method for quick intersection of triangular meshes on the GPU. The method uses a threaded bounding volume hierarchy built from a geometry image, which can be efficiently traversed and constructed entirely on the GPU. This acceleration scheme is highly competitive with other GPU ray tracing methods, while allowing for both dynamic geometry and an efficient level of detail scheme at no extra cost. (Fast GPU Ray Tracing of Dynamic Meshes using Geometry Images Nathan A. Carr, Jared Hoberock, Keenan Crane, and John C. Hart. To appear in Proceedings of Graphics Interface 2006)