This paper presents a novel algorithm for solving dense linear systems using graphics processors (GPUs). It reduces matrix decomposition and row operations to a series of rasterization problems on the GPU. These include new techniques for streaming index pairs, swapping rows and columns and parallelizing the computation to utilize multiple vertex and fragment processors. The paper describes implementation of the algorithm on different GPUs and compares the performance with optimized CPU implementations. In particular, implementation on an NVIDIA GeForce 7800 GTX GPU outperforms a CPU-based ATLAS implementation. Moreover, the results show that the algorithm is cache and bandwidth efficient and scales well with the number of fragment processors within the GPU and the core GPU clock rate. The algorithm is demonstrated in the context of fluid flow simulation. (LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphics Hardware To appear in Proceedings of the 2005 ACM/IEEE Super Computing Conference. November 12-18, 2005.)
LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphics Hardware
August 8th, 2005Illustrative Display of Hidden Iso-Surface Structures using GPU Processing
August 8th, 2005This IEEE Visualization 2005 paper (accepted for publication) describes a new algorithm for the illustrative rendering of iso-surfaces and polygonal models. Using a combination of multi-pass rendering and image-space processing passes, hidden structures and optional additional inner geometry are displayed in real-time. No pre-processing of the geometric models is necessary. This work is part of Jan Fischer’s PhD thesis. (Illustrative Display of Hidden Iso-Surface Structures, Jan Fischer et al., IEEE Visualization 2005)
Data Visualization and Mining using the GPU
July 29th, 2005Sudipto Guha, Shankar Krishnan and Suresh Venkatasubramanian are presenting a tutorial on the use of the GPU for data visualization and mining at the ACM International Conference on Knowledge Discovery and Data Mining (KDD 2005). (Data Visualization and Mining on the GPU)
Evolutionary Computation on GPUs
July 29th, 2005Genetic Algorithms (GA) comprise a class of evolutionary computation (EC). A difficulty with GA is that the traditional crossover operation introduces order-dependency and hence an increase in rendering passes on SIMD GPUs. To parallelize EC on GPUs, this project proposes to use another class of EC called Evolutionary Programming (EP), which applies mutations locally. The project studies in-depth how to efficiently map an EP algorithm to SIMD GPUs, including a scalable and visualizable genome map, mutation, tournament and selection, and finally convergence visualization. Intensive experiments and careful comparisons are conducted to demonstrate its performance speedup and accuracy. The project also shows that it is conceptually wrong and infeasible to generate high-quality random numbers on the current generation of GPUs and that the low-quality random numbers will lead to poor performance of EC. (K. L. Fok, T. T. Wong, and M. L. Wong, “Evolutionary Computing on Consumer-Level Graphics Hardware”, IEEE Intelligent Systems, and “Parallel Evolutionary Algorithms on Graphics Processing Unit” in Proc. of IEEE Congress on Evolutionary Computation 2005.)
A Survey of General-Purpose Computation on Graphics Hardware
July 1st, 2005This new report by Owens et al. is a comprehensive survey of the history and state of the art in GPGPU. It describes, summarizes and analyzes the latest research in mapping general-purpose computation to graphics hardware. The report begins with the technical motivations that underlie general-purpose computation on graphics processors (GPGPU) and describe the hardware and software developments that have led to the recent interest in this field. The authors describe the techniques used in mapping general-purpose computation to graphics hardware, and survey and categorize the latest developments in general-purpose application development on graphics hardware. (A Survey of General-Purpose Computation on Graphics Hardware, by John D. Owens, David Luebke, Naga Govindaraju, Mark Harris, Jens Krüger, Aaron E. Lefohn, Timothy J. Purcell. To appear in proceedings of Eurographics 2005, State of the Art Reports.)
High Performance Sorting on a GPU
July 1st, 2005This paper by Govindaraju et al. describes a cache-efficient bitonic sorting algorithm on GPUs. The algorithm uses the special purpose texture mapping and programmable hardware to sort IEEE 32-bit floating point data including pointers, and has been used to perform stream data mining and relational database queries. Their results indicate a significant performance improvement over prior CPU-based and GPU-based sorting algorithms. ( GPUSORT: A High Performance Sorting Library”. Also see this Tom’s Hardware article)
Initial Experiences Porting a Bioinformatics Application to a Graphics Processor
July 1st, 2005Bioinformatics applications are one of the most compute-demanding applications today. While traditionally these applications are executed on cluster or dedicated parallel systems, this paper by M. Charalambous, P. Trancoso, and A. Stamatikis at the University of Cyprus and FORTH explores the use of an alternative architecture. The authors focus on exploiting the characteristics offered by the graphics processors (GPU) in order to accelerate a bioinformatics application. This paper presents the initial results on porting RAxML, a bioinformatics program for phylogenetic tree inference, to the GPU. (Initial Experiences Porting a Bioinformatics Application to a Graphics Processor. M. Charalambous, P. Trancoso, and A. Stamatakis. Proceedings of the 10th Panhellenic Conference in Informatics (PCI 2005))
Radiance Cache Splatting: A GPU-Friendly GLobal Illumination Algorithm
June 14th, 2005The irradiance caching algorithm is commonly used for fast global illumination since it provides high-quality rendering in a reasonable time. However this algorithm relies on a spatial data structure along with complex algorithms. This central and permanently modified data structure prevents this algorithm from being easily implemented on GPUs. This paper proposes a novel approach to global illumination using irradiance and radiance cache: the Radiance Cache Splatting. This method directly meets the processing constraints of graphics hardware since it avoids the need of complex data structure and algorithms. Moreover, the rendering quality remains identical to classical irradiance and radiance caching. This work will be presented at the Eurographics Symposium on Rendering 2005, and during SIGGRAPH 2005 sketches. (Radiance Cache Splatting: A GPU-Friendly GLobal Illumination Algorithm. Pascal Gautron, Jaroslav Krivanek, Kadi Bouatouch, Sumanta Pattanaik. Proceedings of Eurographics Symposium on Rendering 2005)
Exploring Graphics Processor Performance for General Purpose Applications
June 12th, 2005This paper by P. Trancoso and M. Charalambous at the University of Cyprus presents a comprehensive study of the performance of general-purpose applications on the GPU, and determines the conditions that make the GPU work efficiently. Also, as the GPU is cheaper and consumes less power than a high-end CPU, the authors show the benefits of using the graphics card to extend the life-time of an existing computer system. (Exploring Graphics Processor Performance for General Purpose Applications. P. Trancoso and M. Charalambous. Proceedings of the Eighth Euromicro Conference on Digital System Design (DSD 2005))
Stack Implementation on Programmable Graphics Hardware
June 12th, 2005This paper by Ernst et al. describes a stack implementation for the GPU using textures for storage. For a predefined maximum stack depth, k, either k data textures, or a single large texture with k stack layers side by side are used. Additionally a stack pointer texture is needed. The paper argues that both push and pop can become O(1) operations using fragment program branching. Both push and pop require separate rendering passes. The technique is demonstrated in a kd-tree traversal implementation. (gpu stack bibtex)