This paper by Fan et. al. at Stony Brook University presents the use of a cluster of commodity GPUs for high performance scientific computing. As an example application, they have developed a parallel flow simulation using the lattice Boltzmann model (LBM) on a GPU cluster and have simulated the dispersion of airborne contaminants in the Times Square area of New York City. Using 30 GPU nodes, their simulation can compute a 480 x 400 x 80 LBM in 0.31 second/step, a speed which is 4.6 times faster than that of their previous CPU cluster implementation. Besides the LBM, the paper also discusses other potential applications of the GPU cluster, such as cellular automata, PDE solvers, and FEM. (Zhe Fan, Feng Qiu, Arie Kaufman, Suzanne Yoakum-Stover, GPU Cluster for High Performance Computing, To Appear in Proceedings of the ACM/IEEE SuperComputing 2004 (SC’04), November, 2004)
GPU Cluster for High Performance Computing
August 19th, 2004SIMD Optimization of Linear Expressions for Programmable Graphics Hardware
August 19th, 2004Linear expressions constitute one of the most basic operations in scientific computations. This paper by proposes a SIMD code optimization technique that enables efficient shader codes to be generated for evaluating linear expressions. Performance can be improved considerably by efficiently packing arithmetic operations into four-wide SIMD instructions through reordering of the operations in linear expressions. We demonstrate that this technique can be used effectively for programming both vertex and pixel shaders for a variety of mathematical applications. (SIMD Optimization of Linear Expressions for Programmable Graphics Hardware. C. Bajaj, I. Ihm, J. Min, and J. Oh)
GPU Gems 2 Call For Participation
July 31st, 2004
Following the success of GPU Gems: Programming Techniques, Tips, and Tricks for Real-Time Graphics NVIDIA have decided to produce a second GPU Gems volume in order to showcase the best new ideas and techniques for the latest programmable GPUs. Tentatively titled GPU Gems II: Techniques for Graphics and Compute Intensive Programming, this book will be edited by Matt Pharr, software engineer at NVIDIA.
NVIDIA are looking for ideas from developers who are using GPUs in new ways to create stunning graphics and cutting-edge applications. Chapters should present techniques and ideas that are broadly useful to GPU programmers and can be integrated into their applications. GPU Gems II will have an increased focus on chapters exploring non-graphics applications of the computational capabilities of GPU hardware.
To participate, read the submission guidelines and send an e-mail to articlesubmissions@nvidia.com with your proposed chapter title as the subject line, and the required description in the e-mail body. The deadline for submissions is Monday, August 16, 2004.
Apple’s Core Image Framework for GPUs
July 19th, 2004At its World-Wide Developers Conference Apple introduced Core Image as a feature of its upcoming Tiger release. Core Image is a framework for image processing on the GPU using a modified stream processing paradigm. Core Image is an interesting computational framework for offloading some general-purpose computations on to the GPU. It appears to be the first commercial effort to offer a general image computing environment for GPUs. The library comes with 100 basic plugins, called “Image Units”, and can be extended by developers. The computing model is based on stream processing, where each kernel is expressed in a high-level language and computes a result image based on some number of input images. The kernels can be strung together in arbitrary image computation “graphs”, in a model similar to that described by Michael Shantzis in his 1994 paper A Model for Efficient and Flexible Image Computing. Registered Apple Developers (free registration) can access a pre-release version of Core Image.
Beyond Triangles: A Simple Framework For Hardware-Accelerated Non-Triangular Primitives
July 19th, 2004This paper presents an extensible system for interactively rendering multiple types of ray-casted objects in a manner compatible with pre-existing rendering engines. The sample implementation includes support for general quadrics and volumetric isosurfaces. It also includes a high-speed sphere renderer, and of course a standard triangle-rendering pipeline. The system is designed so that most of the algorithms designed to run on the existing raster engine can be added with minimal overhead/coding effort. We have demonstrated shadowing using the shadow-map algorithm. (“Beyond Triangles: A Simple Framework For Hardware-Accelerated Non-Triangular Primitives”, To be Submitted for publication.)
Hardware Acceleration for Spatial Database Operations
July 19th, 2004These works from the Database Systems Lab at UC Santa Barbara describe how a graphics processor can be effectively used to accelerate the performance of spatial database (GIS databases) operations. Spatial database operations, especially which involve polygon datasets, have been known to be computationally expensive. Sun et al. describe a novel hardware / software co-processing technique which uses basic features of a GPU to reduce the spatial query processing cost. Experimental evaluation shows that their hardware-based approach can significantly outperform leading software-based techniques. (Hardware Acceleration for Spatial Selections and Joins Chengyu Sun, Divyakant Agrawal, Amr El Abbadi. Proceedings of SIGMOD 2003.) However, this evaluation is done in a stand-alone setting where there are no indices, preprocessing or other optimizations available in a database. Bandi et al. extend Sun et al.’s work and integrate the hardware-based technique into a popular commercial database. Rigorous experimentation over real-life data sets shows that the hardware-based approach is very effective and can be complimentary to the optimizations available in a commercial database setting. (Hardware Acceleration in Commercial Databases: A Case Study of Spatial Operations Nagender Bandi, Chengyu Sun, Divyakant Agrawal, Amr El Abbadi to appear in VLDB 2004.)
Understanding the Efficiency of GPU Algorithms for Matrix-Matrix Multiplication
July 15th, 2004Modern GPUs perform floating point math and read data from off-chip memory at rates roughly five times that of a fast Pentium 4 CPU. However, the performance of algorithms for computing dense matrix-matrix products on GPUs has lagged behind that of good CPU implementations. In this paper, we show why this result is not an artifact of poorly designed algorithms, and explain how present-day graphics architectures are highly inefficient for computations such as matrix-matrix multiplication that involve significant data reuse. (Understanding the Efficiency of GPU Algorithms for Matrix-Matrix Multiplication. Kayvon Fatahalian, Jeremy Sugerman, and Pat Hanrahan.)
Simulating Photon Mapping for Real-time Applications
June 11th, 2004This paper by Larsen et al. at Technical University of Denmark introduces a fast GPU accelerated technique for simulating photon mapping. Each of the steps in the photon mapping algorithm are executed either on the CPU or the GPU depending on which of the processors are most appropriate for the task. The indirect illumination is calculated using a new GPU accelerated final gathering method. Caustic photons are traced on the CPU and then drawn using points in the framebuffer, and finally filtered using the GPU. Both diffuse and non-diffuse surfaces are handled by calculating the direct illumination on the GPU and the photon tracing on the CPU. (Simulating Photon Mapping for Real-time Applications. Bent D. Larsen, Niels J. Christensen, To appear at Eurographics Symposium on Rendering, 2004.)
Fast Database Operations using Graphics Processors
June 11th, 2004This paper by Govindaraju et al. describes new algorithms for performing fast computation of several common database operations on commodity graphics processors. Specifically, the paper considers operations such as conjunctive selections, aggregations, and semi-linear queries, which are essential computational components of typical database, data warehousing, and data mining applications. The proposed algorithms take into account some of the limitations of the programming model of current GPUs and perform no data rearrangements. These algorithms have been implemented on a programmable GPU (e.g. NVIDIA’s GeForce FX 5900) and applied to databases consisting of up to a million records. The paper compares their performance with an optimized CPU-based implementation. The experiments indicate that the graphics processor available on commodity computer systems is an effective coprocessor for performing database operations. (Fast Database Operations using Graphics Processors. Naga K. Govindaraju, Brandon Lloyd, Wei Wang, Ming C. Lin, Dinesh Manocha to appear at SIGMOD 2004.)
Benchmarking and Implementation of Probability-Based Simulations on Programmable Graphics Cards
May 24th, 2004This paper explores the plausibility of using the GPU for numerical simulations on structured grids (lattices). The paper (1) reviews previous work on using GPUs for non-graphics applications, (2) implements probability-based simulations on the GPU, namely the Ising and percolation models, (3) implements vector operation benchmarks for the GPU, and (4) compares CPU and GPU performance. The original contribution of this work is implementing Monte Carlo type simulations on the GPU. Such simulations have a wide area of applications. They are computationally intensive and, as shown in the paper, lend themselves naturally to implementation on GPUs, providing a computational speedup. A general conclusion from the results obtained is that moving computations from the CPU to the GPU is feasible, yielding good time and price performance for certain lattice computations. Preliminary results also show that it is feasible to use GPUs in parallel. (S.Tomov, M.McGuigan, R.Bennett, G.Smith, J.Spiletic. Benchmarking and Implementation of Probability-Based Simulations on Programmable Graphics Cards, to appear in Computers & Graphics.)