GPU Simulations of Gravitational Many-body Problem and GPU Octrees

January 20th, 2010

This undergraduate thesis and poster by Kajuki Fujiwara and  Naohito Nakasato from the University of Aizu approach a common problem in astrophysics: the many-body problem, with both brute-force and hierarchical data structures for solving it on ATI GPUs.  Abstracts:

Fast Simulations of Gravitational Many-body Problem on RV770 GPU
Kazuki FujiwaraNaohito Nakasato (University of Aizu)
Abstract:

The gravitational many-body problem is a problem concerning the movement of bodies, which are interacting through gravity. However, solving the gravitational many-body problem with a CPU takes a lot of time due to O(N^2) computational complexity. In this paper, we show how to speed-up the gravitational many-body problem by using GPU. After extensive optimizations, the peak performance obtained so far is about 1 Tflops.

Oct-tree Method on GPU
N.Nakasato
Abstract:

The kd-tree is a fundamental tool in computer science. Among others, an application of the kd-tree search (oct-tree method) to fast evaluation of particle interactions and neighbor search is highly important since computational complexity of these problems are reduced from O(N^2) with a brute force method to O(N log N) with the tree method where N is a number of particles. In this paper, we present a parallel implementation of the tree method running on a graphic processor unit (GPU). We successfully run a simulation of structure formation in the universe very efficiently. On our system, which costs roughly $900, the run with N ~ 2.87×10^6 particles took 5.79 hours and executed 1.2×10^13 force evaluations in total. We obtained the sustained computing speed of 21.8 Gflops and the cost per Gflops of 41.6/Gflops that is two and half times better than the previous record in 2006.

SIGGRAPH Poster: Extended-Precision Floating-Point Numbers for GPU Computation

August 10th, 2006

Using unevaluated sums of paired or quadrupled single-precision (f32) values, double-float (df64) and quad-float (qf128) numeric types can be implemented on current GPUs and used efficiently and effectively for extended-precision computation for real and complex arithmetic. These numeric types provide 48 and 96 bits of precision respectively at f32 exponent ranges for computer graphics and general purpose (GPGPU) programming. Double- and quad-floats may be useful not only for extending available precision but also for accurate computation by only partially IEEE compliant single-precision floats. The poster and demos presented at ACM SIGGRAPH 06 discussed the implementation and application of these numbers in the Cg language for real and complex GPU programming. The df64 library includes math routines for exponential, log, and trigonometric functions. The poster can be downloaded from Andrew Thall’s website.  Technical details will be available shortly, and the code itself will be made available for distribution given sufficient interest.

SIGGRAPH Poster: GPU Histogram Computation

August 10th, 2006

This SIGGRAPH poster by Oliver Fluck et al. presents an approach to computing histograms in fragment shaders. The proposed method enables iterative and histogram-guided algorithms to run on GPUs and avoids data transfer between the GPU and main memory. The algorithm has been demonstrated using the example of a GPU level set segmentation. (GPU Histogram Computation)