Recently, ray tracing on consumer level graphics hardware has been introduced. So far, most published studies on this topic use the uniform grid spatial subdivision structure for reducing the number of ray/triangle intersection tests. For many types of scenes, a hierarchical acceleration structure is more appropriate. This thesis by Lars Ole Simonsen and Niels Thrane of University of Aarhus compares GPU based traversal of kd-trees and uniform grids with a novel bounding volume hierarchy traversal scheme. The three implementations are compared in terms of performance and usefulness on the GPU. The thesis concludes that on the GPU, the bounding volume hierarchy traversal technique is up to 9 times faster than its implementations of uniform grid and kd-tree. Additionally, this technique proves the simplest to implement and the most memory efficient. (Lars Ole’s Website or Direct link to thesis PDF.)

## A Comparison of Acceleration Structures for GPU Assisted Ray Tracing

August 24th, 2005## Accelerating Double-Precision FEM Simulations with GPUs

August 23rd, 2005This paper by Dominik Göddeke, Robert Strzodka and Stefan Turek describes a preliminary algorithm to achieve double precision results by adding a CPU-based defect correction to iterative linear system solvers on the GPU. We demonstrate that identical accuracy as compared to a full CPU double precision solver is possible while still gaining a factor of 2 in speedup compared to a highly tuned cache-aware CPU reference implementation in double precision. (Accelerating Double Precision FEM Simulations with GPUs. Dominik Göddeke, Robert Strzodka and Stefan Turek. To appear in Proceedings of ASIM 2005 – 18th Symposium on Simulation Technique.)

## Caustics Mapping: An Image-space Technique for Real-time Caustics

August 11th, 2005Caustics are complex patterns of shimmering light formed due to reflective and refractive objects; for example, those formed on the floor of a swimming pool. Caustics Mapping is a physically based real-time caustics rendering algorithm. It utilizes the concept of backward ray-tracing, however it involves no expensive computations that are generally associated with ray-tracing and other such techniques. The main advantage of caustics mapping is that it is extremely practical for games and other interactive applications because of its high frame rates. Furthermore, the algorithm runs entirely on graphics hardware, which leaves the CPU free for other computation. There is no pre-computation involved, and therefore fully dynamic geometry, lighting, and viewing directions are supported. In addition, there is no limitation on the topology of the reciever geometry, i.e., caustics can be formed on arbitrary surfaces. (Caustics Mapping: An Image-space Technique for Real-time Caustics. Musawir A. Shah and Sumanta Pattanaik. Technical Report, School of Engineering and Computer Science, University of Central Florida, CS TR 50-07, 07/29/2005 (Submitted for Publication))

## ClawHMMer: A Streaming HMMer-Search Implementation

August 11th, 2005Many current and upcoming architectures offering large amounts of computational power are designed with data-parallel execution and streaming in mind. We present a streaming algorithm for evaluating an HMM’s Viterbi probability and refine it for the specific HMM used in biological sequence search. We implement our streaming algorithm in the Brook language, allowing us to execute the algorithm on graphics processors. We demonstrate that this streaming algorithm on graphics processors can outperform available CPU implementations. We also demonstrate this implementation running on a 16-node graphics cluster. (ClawHMMer: A Streaming HMMer-Search Implementation. Daniel Horn, Mike Houston, and Pat Hanrahan. *Proceedings of Supercomputing 2005*.)

## Dynamic LOD on the GPU

August 11th, 2005To implement dynamic LOD on the GPU, a quadtree structure is created based on a seamless geometry image atlas,and all the nodes in the quadtree are packed into the atlas textures. There are two passes in the approach. In the first pass, the LOD selection is performed in fragment shaders. The resultant buffer is taken as the input texture to the second pass by vertex texturing, and node culling and triangulation are performed in vertex shaders. The LOD algorithm can generate adaptive meshes dynamically, and can be fully implemented on the GPU. It improves the efficiency of LOD selection, and reduces computing load on CPU. (Dynamic LOD on GPU. Junfeng Ji, Enhua Wu, Sheng Li, and Xuehui Liu. *Proceedings of Computer Graphics International 2005*.)

## LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphics Hardware

August 8th, 2005This paper presents a novel algorithm for solving dense linear systems using graphics processors (GPUs). It reduces matrix decomposition and row operations to a series of rasterization problems on the GPU. These include new techniques for streaming index pairs, swapping rows and columns and parallelizing the computation to utilize multiple vertex and fragment processors. The paper describes implementation of the algorithm on different GPUs and compares the performance with optimized CPU implementations. In particular, implementation on an NVIDIA GeForce 7800 GTX GPU outperforms a CPU-based ATLAS implementation. Moreover, the results show that the algorithm is cache and bandwidth efficient and scales well with the number of fragment processors within the GPU and the core GPU clock rate. The algorithm is demonstrated in the context of fluid flow simulation. (LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphics Hardware To appear in Proceedings of the 2005 ACM/IEEE Super Computing Conference. November 12-18, 2005.)

## Illustrative Display of Hidden Iso-Surface Structures using GPU Processing

August 8th, 2005This IEEE Visualization 2005 paper (accepted for publication) describes a new algorithm for the illustrative rendering of iso-surfaces and polygonal models. Using a combination of multi-pass rendering and image-space processing passes, hidden structures and optional additional inner geometry are displayed in real-time. No pre-processing of the geometric models is necessary. This work is part of Jan Fischer’s PhD thesis. (Illustrative Display of Hidden Iso-Surface Structures, Jan Fischer et al., *IEEE Visualization 2005*)

## Data Visualization and Mining using the GPU

July 29th, 2005Sudipto Guha, Shankar Krishnan and Suresh Venkatasubramanian are presenting a tutorial on the use of the GPU for data visualization and mining at the ACM International Conference on Knowledge Discovery and Data Mining (KDD 2005). (Data Visualization and Mining on the GPU)

## SIGGRAPH 2005 GPGPU Course

July 29th, 2005Once again this year ACM SIGGRAPH will feature a full-day course titled “GPGPU: General-Purpose Computing on Graphics Hardware”. The course, organized by Mark Harris of NVIDIA and David Luebke of the University of Virginia, will feature GPGPU experts from industry and academia. The course will discuss core computational building blocks such as sorting, searching, and linear algebra, using case studies ranging from adaptive shadow mapping to database queries and data mining. Particular focus will be given to tools, perils, and tricks of the trade in general-purpose GPU programming. The course has been updated from SIGGRAPH 2004, with all new case studies. (http://www.gpgpu.org/s2005)

## Evolutionary Computation on GPUs

July 29th, 2005Genetic Algorithms (GA) comprise a class of evolutionary computation (EC). A difficulty with GA is that the traditional crossover operation introduces order-dependency and hence an increase in rendering passes on SIMD GPUs. To parallelize EC on GPUs, this project proposes to use another class of EC called Evolutionary Programming (EP), which applies mutations locally. The project studies in-depth how to efficiently map an EP algorithm to SIMD GPUs, including a scalable and visualizable genome map, mutation, tournament and selection, and finally convergence visualization. Intensive experiments and careful comparisons are conducted to demonstrate its performance speedup and accuracy. The project also shows that it is *conceptually wrong* and *infeasible* to generate high-quality random numbers on the current generation of GPUs and that the low-quality random numbers will lead to poor performance of EC. (K. L. Fok, T. T. Wong, and M. L. Wong, “Evolutionary Computing on Consumer-Level Graphics Hardware”, *IEEE Intelligent Systems*, and “Parallel Evolutionary Algorithms on Graphics Processing Unit” in *Proc. of IEEE Congress on Evolutionary Computation 2005*.)