Supercomputing ’06 Workshop: "General-Purpose GPU Computing: Practice And Experience"

August 21st, 2006

SC’06 is proud to announce the “General-Purpose GPU Computing: Practice and Experience” workshop. This workshop features invited speakers and poster presenters who provide insights into current GPGPU practice and experience, and chart future directions in heterogeneous and homogeneous multi-core processor architectures and data-parallel processor architectures such as GPUs. The topics addressed by the speakers range from current GPGPU practice and experience to future issues and research areas in parallel computing currently being driven by GPGPU innovations and lessons learned, such as the IBM Cell Broadband Engine and Sun Microsystem’s Niagara/Sun4v processor. Poster presentations are solicited in, but not strictly limited to, the following

  • Application acceleration
  • GPGPU/multi-core/parallel coprocessor integration: toolkits, implementation techniques (e.g., iterative refinement, numerical techniques), domain-specific languages
  • GPGPU implementation issues: performance issues and challenges, cooperative GPU/CPU algorithms and solutions, numerical analysis issues, HPC issues
  • Cluster-based GPGPU computing and Grid integration

Please submit prospective poster abstracts in PDF or PostScript format to the workshop chair for consideration and review. Poster abstract submission deadline: No later than October 1st. (For more information see

EM Photonics releases free GPU-Based FDTD Accelerator

August 18th, 2006

EM Photonics, Inc., a leading provider of accelerated hardware technologies, released FastFDTD, a free 2D and 3D accelerated FDTD solver based on GPU technology. The FastFDTD toolkit contains all files and documentation necessary to accelerate FDTD computations using a simple input file format. The 2D and 3D solvers include a variety of sources and materials, and more are being added. When asked why EM Photonics was providing this toolkit for free, Eric Kelmelis, Vice President, said

We decided to release our GPU-based FDTD accelerator free of charge to demonstrate the power of application acceleration with alternative computational platforms. This solver shows a single graphics card running 20-30 times faster than an optimized software implementation. Our focus will remain on pushing the boundaries of this technology and accelerating other applications with commodity hardware devices such as graphics cards and FPGAs.

For more information, including specific feature sets, compatible graphics cards, and detailed license information, please visit the FastFDTD webpage at

GPU-ABiSort: Optimal Parallel Sorting on Stream Architectures

August 14th, 2006

This paper presents a novel approach for parallel sorting on stream processing architectures. It is based on adaptive bitonic sorting. For sorting n values utilizing p stream processor units, this approach achieves the optimal time complexity O((n log n)/p). This approach is competitive with common sequential sorting algorithms not only from a theoretical viewpoint, it is also very fast from a practical viewpoint. The paper presents an implementation on modern programmable graphics hardware (GPUs). On recent GPUs this optimal parallel sorting approach has shown to be remarkably faster than sequential sorting on the CPU, and it is also faster than previous non-optimal sorting approaches on the GPU for sufficiently large input sequences. (GPU-ABiSort: Optimal Parallel Sorting on Stream Architectures Alexander Gress and Gabriel Zachmann. Proc. 20th IEEE Int’l Parallel and Distributed Processing Symposium (IPDPS), 2006.)

A New Low-Level Interface for GPGPU Applications on ATI GPUs

August 10th, 2006

At SIGGRAPH in Boston, Derek Gerstmann of ATI presented a sketch titled, “A Performance-Oriented Data Parallel Virtual Machine for GPGPU Applications.” The system exposes GPU functionality at a low-level (including the fragment processors’ native instruction set), giving the programmer direct control over program compilation and loading, GPU memory management, and GPU/CPU synchronization. A write-up is available at If you are interested in obtaining the system for evaluation, please contact

SIGGRAPH Poster: Extended-Precision Floating-Point Numbers for GPU Computation

August 10th, 2006

Using unevaluated sums of paired or quadrupled single-precision (f32) values, double-float (df64) and quad-float (qf128) numeric types can be implemented on current GPUs and used efficiently and effectively for extended-precision computation for real and complex arithmetic. These numeric types provide 48 and 96 bits of precision respectively at f32 exponent ranges for computer graphics and general purpose (GPGPU) programming. Double- and quad-floats may be useful not only for extending available precision but also for accurate computation by only partially IEEE compliant single-precision floats. The poster and demos presented at ACM SIGGRAPH 06 discussed the implementation and application of these numbers in the Cg language for real and complex GPU programming. The df64 library includes math routines for exponential, log, and trigonometric functions. The poster can be downloaded from Andrew Thall’s website.  Technical details will be available shortly, and the code itself will be made available for distribution given sufficient interest.

SIGGRAPH Poster: GPU Histogram Computation

August 10th, 2006

This SIGGRAPH poster by Oliver Fluck et al. presents an approach to computing histograms in fragment shaders. The proposed method enables iterative and histogram-guided algorithms to run on GPUs and avoids data transfer between the GPU and main memory. The algorithm has been demonstrated using the example of a GPU level set segmentation. (GPU Histogram Computation)

GPU_KLT: A GPU-based Implementation of the Kanade-Lucas-Tomasi Feature Tracker

August 10th, 2006

GPU_KLT is an implementation (using OpenGL/Cg) of the popular KLT feature tracker which runs primarily on the graphics processing unit (GPU). The GPU-based implementation emulates Stan Birchfield’s KLT implementation of the original algorithm proposed by Kanade, Lucas and Tomasi (1991). GPU_KLT tracks approximately 1000 feature points within 1024×768 resolution video at 30 Hz on an ATI 1900 XT and at 25 Hz on a Nvidia Geforce 7900 GTX. It can be used for real-time computer vision systems involving object detection, structure from motion, robot navigation and video surveillance. Source code is available for research use on the GPU_KLT webpage (Sudipta N Sinha, Jan-Michael Frahm, Marc Pollefeys and Yakup Genc, “Feature Tracking and Matching in Video Using Programmable Graphics Hardware”,
submitted to Machine Vision and Applications, July 2006.)

Real-Time Relativistic Optical Calculations on the GPU

August 10th, 2006

This paper by Savage, Searle and McCalman describes a program which uses the built in support for 4-vector/matrix operations on a programmable GPU to perform Lorentz transformations on relativistic 4-momentum vectors in real time. This allows a pixel shader to render relativistic effects such as Geometric Aberration, Doppler shift and the Headlight effect in response to user’s interaction. A program, “Real-Time Relativity”, has been written to demonstrate these effects. (Real-Time Relativity. C. M. Savage, A. C. Searle, L. McCalman. Physics ArXiv)

Ph.D. dissertation discusses GPU-accelerated advanced rendering and image processing techniques

August 10th, 2006

The Ph.D. dissertation Rendering Methods for Augmented Reality by Jan Fischer describes several GPU-based methods for artistic and illustrative rendering. A real-time video filter is described, which generates a cartoon-like version of the input video and is executed entirely on the GPU (Section 3.3). Section 4.2 of the thesis discusses a GPU-based algorithm for the real-time illustrative display of hidden structures in polygonal datasets. In Section 4.3, the real-time conversion of augmented reality video streams into an illustrative style on the GPU is described. The thesis discusses the underlying image processing and rendering algorithms as well as implementation-specific aspects of the respective GPU techniques. (Jan Fischer, Rendering Methods for Augmented Reality, Dissertation, University of Tübingen, June 2006)

Geomerics Demonstrate Real-Time Radiosity on the GPU

August 9th, 2006

Geomerics, a new R&D company based in Cambridge UK, have recently announced a real-time radiosity simulation running entirely on the GPU. The solution runs at up to 100hz on common graphics hardware and allows for fully dynamic lighting, including spot-lights, projected texture or video lighting, and area lights. It integrates well with traditional modeling techniques such as normal mapping, and all lighting is performed in high dynamic range. Videos, screen shots and further details of the simulation can be found on the  Geomerics website.

Page 86 of 108« First...102030...8485868788...100...Last »