Abstract: The Graphics Processing Unit (GPU) has evolved into a powerful and flexible processor. The latest graphics processors provide fully programmable vertex and pixel processing units that support vector operations up to single floating-point precision. This computational power is now being used for general-purpose computations. However, some applications require higher precision than single precision. This paper describes the emulation of a 44-bit floating-point number format and its corresponding operations. An implementation is presented along with performance and accuracy results. (G. Da Graca, D. Defour. Implementation of float-float operators on graphics hardware. 7th conference on Real Numbers and Computers, RNC7, Nancy, France, July 2006.)

## Implementation of float-float operators on graphics hardware

July 22nd, 2006## GPGPU “Birds of a Feather” at SIGGRAPH 2006

July 19th, 2006- Location: Boston Convention and Exhibition Center
- Room: Room 108
- Date: Thursday, August 3rd
- Time: 11am-1:00pm

Since there is not a GPGPU course offering at SIGGRAPH this year, we have scheduled a GPGPU “Birds of a Feather” (BOF) for everyone interested in GPGPU at SIGGRAPH. The current plan is for the BOF to be an informal gathering to chat about GPGPU. Many of the academics doing research in GPGPU plan to be there, as well as industry folks, including ATI and NVIDIA.

Since the BOF is scheduled during lunch, it will be a “brown bag” event, so bring lunch with you. We’ll keep you updated on any status changes in the forums. (Credit goes to Mike Houston for organizing.)

## Fantasy Lab introduces GPU-accelerated real-time global illumination engine with displacement-mapped subdivision surfaces

June 30th, 2006Fantasy Lab, a game developer located in the San Francisco Bay area, has announced its new game engine, which includes support for real-time global illumination and displacement-mapped subdivision surfaces. Videos on the company’s website show global illumination on an animated subdivision-surface-based character. The global illumination solution for the videos is calculated in 3.3 milliseconds per frame (300 frames per second) on an NVIDIA GeForce Go 7900 GTX (a laptop GPU).

## A work-efficient step-efficient prefix-sum algorithm

June 22nd, 2006This extended abstract by Sengupta et al. presents a work-efficient step-efficient prefix-sum algorithm. This algorithm achieves a three to four fold speedup over the step-efficient prefix-sum algorithm presented by Daniel Horn in GPU Gems 2. It can also be tuned to efficiently run on future hardware which would have a higher degree of parallelism. (A work-efficient step-efficient prefix-sum algorithm. Shubhabrata Sengupta, Aaron E. Lefohn, John D. Owens in in Proceedings of the 2006 Workshop on Edge Computing Using New Commodity Architectures.)

## TyphoonLabs OpenGL Shader Designer source code released

June 21st, 2006TyphoonLabs has released the OpenGL Shader Designer source code, in response to many requests, and in gratitude to the OpenGL community. The source code is released under the LGPL license and can be downloaded from http://www.typhoonlabs.com in the

downloads section. Linux binaries will be released later this month.

## MSR Accelerator Now Available for Download

June 21st, 2006The Accelerator GPGPU programming system from Microsoft Research is now available for download. The system was mentioned previously here on gpgpu.org. A key purpose in releasing the software is to get feedback from the gpgpu community about the programming model and the API. Microsoft Research are also interested in building higher level libraries using the system.

(http://research.microsoft.com/downloads. Also see the Accelerator Project Wiki.)

## GPGPU Tutorial and Sample Code

June 20th, 2006A half-day GPGPU tutorial session was given by Dominik Göddeke and Robert Strzodka in conjunction with the ICCS 2006 conference in Reading, UK. After a comprehensive introduction to the GPU programming model with many examples,possibilities to increase performance and accuracy in GPGPU applications were presented. (Slides and tutorial code)

## Eight GPGPU Papers Presented at ICCS 2006 GPGPU Workshop

June 20th, 2006Abstracts, citations and links to author homepages of eight papers on GPGPU presented at the ICCS conference, Reading, UK, May 2006, are available. Topics include genome sequencing, GPGPU languages, database operations, computational fluid dynamics, computer vision, computational geometry and neural networks. (http://www.mathematik.uni-dortmund.de/~goeddeke/iccs/papers.html)

## Cholesky Decomposition and Linear Programming on a GPU

June 6th, 2006Rapid evolution of GPUs in performance, architecture, and programmability provides general and scientific computational potential beyond their primary purpose, graphics processing. This work presents an efficient algorithm for solving symmetric and positive definite linear systems using the GPU. Using the decomposition algorithm and other basic building blocks for linear algebra on the GPU, the paper demonstrates a GPU-powered linear program solver based on a Primal-Dual Interior-Point Method. (Cholesky Decomposition and Linear Programming on a GPU, Jin Hyuk Jung, Scholarly Paper Directed by Dianne P. O’Leary, Department of Computer Science, University of Maryland, 2006.)

## GPUFFTW: High Performance GPU-based FFT Library

May 30th, 2006This paper by Govindaraju et al. describes a high-performance FFT algorithm on GPUs. The algorithm is highly tuned for GPUs using memory optimizations. It further improves performance using pipelining strategies. In practice, it is able to achieve 4x higher computational performance on a $500 NVIDIA GPU than optimized single precision FFT algorithms on high-end CPUs costing $1500. (“Efficient memory model for scientific algorithms on graphics processors”, Naga Govindaraju, Scott Larsen, Jim Gray and Dinesh Manocha, UNC Tech. Report 2006)