Modern GPUs perform floating point math and read data from off-chip memory at rates roughly five times that of a fast Pentium 4 CPU. However, the performance of algorithms for computing dense matrix-matrix products on GPUs has lagged behind that of good CPU implementations. In this paper, we show why this result is not an artifact of poorly designed algorithms, and explain how present-day graphics architectures are highly inefficient for computations such as matrix-matrix multiplication that involve significant data reuse. (Understanding the Efficiency of GPU Algorithms for Matrix-Matrix Multiplication. Kayvon Fatahalian, Jeremy Sugerman, and Pat Hanrahan.)
Understanding the Efficiency of GPU Algorithms for Matrix-Matrix Multiplication
July 15th, 2004A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware
August 29th, 2003This GH2003 paper by Goodnight et al. at the University of Virginia is a demonstrates an implementation on of the multigrid method for solving boundary value problems, such as the systems of partial differential equations that arise in physical simulation problems like fluid flow, heat transfer, and tone mapping. This paper is an expanded, more detailed version of the authors’ earlier tech report. (A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware. Nolan Goodnight, Cliff Woolley, Gregory Lewin, David Luebke, and Greg Humphreys. To appear in the proceedings of Graphics Hardware 2003.)
Linear Algebra Operators for GPU Implementation of Numerical Algorithms
August 29th, 2003This paper describes a framework for the implementation of linear algebra operators on GPUs, providing the building blocks for the design of more complex numerical algorithms. The framework takes advantage of sparse and banded matrices in particular. The paper demonstrates the approach by implementing direct solvers for sparse matrices with application to multi-dimensional finite difference equations, i.e. the 2D wave equation and the incompressible Navier-Stokes equations. (Linear Algebra Operators for GPU Implementation of Numerical Algorithms. Jens Krüger and Rüdiger Westermann. To appear in the proceedings of SIGGRAPH 2003.)
Dense Matrix Algebra on the GPU
June 18th, 2003This paper from the upcoming book ShaderX 2 Programming by Ádám Moravánszky gives a detailed description of implementing dense matrix operations on programmable GPUs. Matrix multiplication is applied to solving linear systems of equations and the linear complementarity problem, which can in turn be used to simulate soft body and rigid body physics. The performance of the GPU implementation is compared to the SSE2 optimized ATLAS library running on the CPU. DirectX 9 pixel and vertex shader programs are provided. (Dense Matrix Algebra on the GPU. Ádám Moravánszky. To appear in ShaderX 2 Programming, Wordware, 2003.)
Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid
April 18th, 2003This paper by Bolz et al. of Cal Tech shows two basic, broadly useful, computational kernels implemented on GPUs: a sparse matrix conjugate gradient solver, and a regular-grid multigrid solver. The paper demonstrates a prototype implementation on NVIDIA’s GeForce FX, using geometric flow (cube smoothing movie, 3D photography scan denoising movie) and fluid simulation (particle advection movie) as application examples. (Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid. Jeff Bolz, Ian Farmer, Eitan Grinspun and Peter Schröder. To appear in the proceedings of SIGGRAPH 2003.)
A Multigrid Solver for Boundary Value Problems Using Graphics Hardware
February 20th, 2003This paper by Goodnight et al. at the University of Virginia demonstrates an implementation on two modern graphics architectures of the multigrid method for solving boundary value problems, such as the systems of partial differential equations that arise in physical simulation problems like fluid flow and heat transfer. (A Multigrid Solver for Boundary Value Problems Using Graphics Hardware. Nolan Goodnight, Gregory Lewin, David Luebke, and Kevin Skadron, University of Virginia Technical Report CS-2003-03 (January 2003).)
Fast Matrix Multiplies using Graphics Hardware
November 14th, 2002Scott Larsen and David McAllister of UNC Chapel Hill describe the use of GPUs to perform large matrix-matrix multiplies. (Fast Matrix Multiplies using Graphics Hardware. E. Scott Larsen,
David K. McAllister. Supercomputing 2001 (Denver, CO) November, 2001.)
Using Modern Graphics Architectures for General-Purpose Computing: A Framework and Analysis
November 14th, 2002A paper by Thompson et al. of the University of Washington. From the abstract: “We develop a programming framework and apply it to a variety of problems, including matrix multiplication and 3-SAT.” (Using Modern Graphics Architectures for General-Purpose Computing: A Framework and Analysis. Chris J. Thompson, Sahngyun Hahn, and Mark Oskin. International Symposium on Microarchitecture (MICRO), Turkey, Nov. 2002)