GPU VSIPL Library

March 31st, 2009

GPU VSIPL is an implementation of Vector Signal Image Processing Library that targets Graphics Processing Units (GPUs) supporting NVIDIA’s CUDA platform. By leveraging processors capable of 900 GFLOP/s or more, your application may achieve considerable speedup without any specialized development for GPUs. The GPU VSIPL range-Doppler map application achieved a 75x speedup on the GPU simply by linking it with GPU VSIPL.

GPU VSIPL is currently released as a static library, and all releases are verified with the VSIPL Core Lite Test Suite.

GPU VSIPL was presented to the High Performance Embedded Computing Workshop 2008. Read the GPU VSIPL extended abstract [PDF].For more information, visit the GPU VSIPL Website.

Parallel Implementation of the 2D Discrete Wavelet Transform on Graphics Processing Units: Filter Bank versus Lifting

February 11th, 2008

Abstract: “The widespread usage of the Discrete Wavelet Transform (DWT) has motivated the development of fast DWT algorithms and their tuning on all sorts of computer systems. Several studies have compared the performance of the most popular schemes, known as Filter Bank (FBS) and Lifting (LS), and have always concluded that Lifting is the most efficient option. However, there is no such study on streaming processors such as modern Graphic Processing Units (GPUs). Current trends have transformed these devices into powerful stream processors with enough flexibility to perform intensive and complex floating-point calculations. The opportunities opened up by these platforms, as well as the growing popularity of the DWT within the computer graphics field, make a new performance comparison of great practical interest. Our study indicates that FBS outperforms LS in current generation GPUs. In our experiments, the actual FBS gains range between 10% and 140%, depending on the problem size and the type and length of the wavelet filter. Moreover, design trends suggest higher gains in future generation GPUs. (Parallel Implementation of the 2D Discrete Wavelet Transform on Graphics Processing Units: Filter Bank versus Lifting. Christian Tenllado, Javier Setoain, Manuel Prieto, Luis Piñuel, and Francisco Tirado. IEEE Transactions on Parallel and Distributed Systems ,vol. 19, no. 3, pp. 299-310, March, 2008. )

GPUFFTW: High Performance GPU-based FFT Library

May 30th, 2006

This paper by Govindaraju et al. describes a high-performance FFT algorithm on GPUs. The algorithm is highly tuned for GPUs using memory optimizations. It further improves performance using pipelining strategies. In practice, it is able to achieve 4x higher computational performance on a $500 NVIDIA GPU than optimized single precision FFT algorithms on high-end CPUs costing $1500. (“Efficient memory model for scientific algorithms on graphics processors”, Naga Govindaraju, Scott Larsen, Jim Gray and Dinesh Manocha, UNC Tech. Report 2006)

An Implementation of a FIR Filter on a GPU

September 19th, 2005

Alexey Smirnov and Tzi-cker Chiueh from Stony Brook University have published a technical report describing an implementation of a FIR filter on a GPU. The results of the performance evaluation using a Geforce 6600 video card and a Pentium 4-HT 3.2 GHz-based PC indicate that the GPU implementation is better than the SSE-optimized CPU implementation for certain input parameters. (FIR on GPU project. Report: An Implementation of a FIR Filter on a GPU (warning: postscript). Technical Report, Experimental Computer Systems Lab, Stony Brook University, 2005.)

GPU Accelerated General Purpose Data Processing with MAX/MSP/Jitter

August 11th, 2005

The latest versions of Cycling ’74s MAX/MSP/Jitter software packages provide a visual programming environment for new media with applications in GPU based stream processing, real-time video processing, volume visualization, and generic n-dimensional data analysis and signal processing. Jitter supports cascaded GLSL/Cg/ARB/NV shader programs with a streamlined render-to-texture interface, allowing fast prototyping of complex shader effects to be processed in a generic data flow network. (Jitter v1.5 Upgrade Info. Cycling ’74.)

Audio and the Graphics Processing Unit

May 16th, 2005

From the abstract: In recent years, the development of programmable graphics pipelines has placed the power of parallel computation in the hands of consumers. Systems developers are now paying attention to the general purpose computational ability of these graphics processor units, or GPUs, and are using them in novel ways. This paper examines using pixel shaders for executing audio algorithms. We compare GPU performance to CPU performance, discuss problems encountered, and suggest new directions for supporting the needs of the audio community. Source code is also available. (Audio and the Graphics Processing Unit”, by Sean Whalen)

Fourier Volume Rendering on the GPU Using a Split-Stream FFT

March 1st, 2005

This paper by Jansen et al. describes how to utilize current commodity graphics hardware to perform Fourier volume rendering directly on the GPU. The paper presents a novel implementation of the Fast Fourier Transform: This Split-Stream-FFT maps the recursive structure of the FFT to the GPU in an efficient way. Additionally, high-quality resampling within the frequency domain is discussed. The implementation enables visualization of large volumetric data sets at interactive frame rates on a mid-range computer system. (Fourier Volume Rendering on the GPU Using a Split-Stream FFT)

The Discrete Wavelet Transform on a GPU

March 23rd, 2004

This website presents a fast GPU algorithm to perform the discrete wavelet transform featuring flexible boundary extension schemes, flexible wavelet kernels, Cg shader implementation, and high precision. The algorithm was developed by the Graphics Team at The Chinese University of Hong Kong. The beauty of the method is that both forward and inverse wavelet transforms are unified using position-dependent filtering and convolution and an indirect addressing technique. The software is open source and free for any commercial or academic use, and is currently available both as an unofficial GPU extension to the Jasper JPEG2000 software and as a standalone DWTGPU C++ class with a demo program. (Jianqing Wang, Tien-Tsin Wong, Pheng-Ann Heng and Chi-Sing Leung. The Discrete Wavelet Transform on a GPU.)

Accelerating Wavelet Transformations with Graphics Hardware

February 19th, 2004

Two papers from the VIS Group Stuttgart describe implementations of wavelet-based multi-resolution analysis using OpenGL. Wavelets are commonly used for signal processing and image compression (e.g. for JPEG 2000). The papers focus on details of implementing wavelet decomposition and reconstruction using graphics hardware, and develop a scaled version of wavelet analysis that constrains data to the [0,1] range of fixed-point frame buffers. See also the project page for more about hardware-based filtering. (Hardware-Based Wavelet Transformations. Matthias Hopf and Thomas Ertl. Workshop on Vision, Modeling, and Visualization 1999, pp 317-328. Hardware-Accelerated Wavelet Transformations. Matthias Hopf and Thomas Ertl. Proc. EG/IEEE TCVG Symposium on Visualization VisSym 2000, pp 93-103.)

Accelerating 3D Convolution using Graphics Hardware

February 19th, 2004

This paper from the VIS Group Stuttgart shows the first volume filtering algorithm that uses OpenGL for the convolution process. Filtering volume data is useful for noise reduction, feature detection, and segmentation. The process is significantly accelerated on SGI graphics workstations with hardware support for two-dimensional image convolution in the frame buffer. Generic 3D convolution can be added as a powerful tool in interactive volume visualization toolkits. See also the project page for more about hardware-based filtering. (Accelerating 3D Convolution using Graphics Hardware. Matthias Hopf and Thomas Ertl. Proc. Visualization 1999, pp 471–474.)

Page 1 of 212