Sh Version 0.8rc0 Released

November 10th, 2005

Sh Version 0.8.0rc0, the first release candidate for the upcoming Sh 0.8, is now available. There are plenty of  new features and bug fixes, but most importantly this release has an API that completely matches the book Metaprogramming GPUs with Sh, which the 0.8.x series of releases will stick to. (

Call for Papers: GPGPU Workshop at ICCS 2006

November 2nd, 2005

A special workshop dedicated to “GPGPU: Methods, algorithms and applications” will be hosted in conjunction with the 6th International Conference On Computational Science (ICCS 2006). The primary goal of this session is to present the GPU as a powerful parallel processor. Contributions presenting novel, original work in all areas of GPGPU, e.g. from hardware abstractions to specific applications, are cordially invited. For more information, please refer to the official workshop web page.

GPGPU Application wins 1st prize in IEEE Visualization Contest

October 31st, 2005

Jens Schneider, Polina Kondratieva, Jens Krüger, and Rüdiger Westermann from TU Munich have won the 2005 IEEE Visualization Contest with their work “All you need is particles!” Check out the video of their results; it’s very interesting.

Ray Tracing the Quaternion Julia Set on the GPU

October 25th, 2005

The quaternion Julia fractal is a complex and beautiful object, yet its parameter space is difficult to explore due to the high cost of visualization. Fortunately, rendering the Julia set by ray tracing or “sphere tracing” its surface is an algorithm well suited to the GPU: it has high arithmetic intensity and uses virtually no bandwidth. A GPU implementation (with source) of this algorithm that allows real-time interaction with the Julia set has been made available by Keenan Crane. “Instigating a platform tug of war: Graphics vendors hunger for CPU suppliers’ turf”

October 17th, 2005

This article by EDN Senior Technical Editor Brian Dipert investigates the increasing generalization of the GPU business, covering wide-ranging topics including the AGP-to-PCI-express platform transition; GPGPU; video codecs; image editing; and physics simulation. In addition to this article, Dipert will be continuing his exploration of these topics in his blog. He has already posted several entries related to the main article (1, 2, 3, 4, 5, 6).
(Instigating a platform tug of war: Graphics vendors hunger for CPU suppliers’ turf. Brian Dipert.

Toward Real-Time Fractal Image Compression Using Graphics Hardware

October 17th, 2005

This ISVC 2005 paper by Ugo Erra presents parallel fractal image compression using programmable graphics hardware. The main problem of fractal compression is the very high computing time needed to encode images. The implementation in this paper exploits the SIMD architecture and inherent parallelism of recent GPUs to speed up the baseline approach of fractal encoding. The results presented are achieved on inexpensive and widely available graphics boards. (Toward Real-Time Fractal Image Compression Using Graphics Hardware. Ugo Erra. In Proceedings of International Symposium on Visual Computing 2005)

ATI Annouces "X1K" Family of Graphics Processors

October 6th, 2005

Yesterday ATI announced its new line of GPUs, the X1K family. This family includes the flagship Radeon X1800 XT and XL GPUs (codenamed R520), the mid-range Radeon X1600 XT and Pro GPUs (code named RV530), and the mainstream Radeon X1300 and X1300 Pro GPUs (code named RV515). For a detailed overview, see the articles at ExtremeTech or Beyond3D. ATI has also announced preliminary plans to enable GPGPU development by publishing a detailed spec and a thin abstraction interface for programming the new GPUs.

Approximate Ray-Tracing on the GPU with Distance Impostors

October 6th, 2005

This paper presents a fast approximation method to obtain the point hit by a reflection or refraction ray. The calculation is based on the distance values stored in environment map texels. This approximation is used to localize environment mapped reflections and refractions; that is, to make them depend on where they occur. On the other hand, placing the eye into the light source, the method is also good to generate real-time caustics. Computing a map for each refractor surface, we can even evaluate multiple refractions without tracing rays. The method is fast and accurate if the scene consists of larger planar faces, when the results are similar to that of ray-tracing. On the other hand, the method suits very well to the GPU architecture, and can render ray-tracing and global illumination effects at a few hundred frames per second. The primary application area of the proposed method is the introduction of these effects in games. (Approximate Ray-Tracing on the GPU with Distance Impostors. Laszlo Szirmay-Kalos, Barnabas Aszodi, Istvan Lazanyi, and Matyas Premecz. Department of Control Engineering and Information Technology, Technical University of Budapest.)

Real-Time, GPU-Based Foreground-Background Segmentation

October 6th, 2005

Robust and accurate foreground-background segmentation is a relatively small but crucial step in several computer vision applications. It is a key element in surveillance, 3D-modelling from silhouettes, motion capture, or gesture analysis for human-computer interaction (HCI). For several of these, real-time processing is of main importance and thus should be extremely fast. This work by Andreas Griesser of ETH Zurich proposes a high-speed GPU-based implementation that processes image sequences in less than 4ms per frame and frees the CPU from this processing step altogether. Resulting segmentation exhibits compactness and smoothness in foreground areas as well as for inter-frame temporal contiguity. (Project homepage and software downloadAndreas Griesser, Computer Vision Lab, ETH Zuerich.)

An Implementation of a FIR Filter on a GPU

September 19th, 2005

Alexey Smirnov and Tzi-cker Chiueh from Stony Brook University have published a technical report describing an implementation of a FIR filter on a GPU. The results of the performance evaluation using a Geforce 6600 video card and a Pentium 4-HT 3.2 GHz-based PC indicate that the GPU implementation is better than the SSE-optimized CPU implementation for certain input parameters. (FIR on GPU project. Report: An Implementation of a FIR Filter on a GPU (warning: postscript). Technical Report, Experimental Computer Systems Lab, Stony Brook University, 2005.)