First TopCoder CUDA SuperHero Challenge Under Way

September 16th, 2009

Today NVIDIA and TopCoder launched the first contest in the CUDA SuperHero Challenge.

The first contest challenges participants to develop the highest performing solution for GPU-Accelerated Connected Component Labeling of images. CCL is a simple but computationally intensive image processing operation that is used in many applications including machine vision, real-time object recognition, and security.

TopCoder is a large community of over 200,000 members of which over 32,000 have been active participants in the last 90 days. Anyone can register to be a TopCoder to participate in the CUDA SuperHero Challenge.  Contestants around the world will be competing for some hard cash – and also the opportunity to be TopCoder’s first CUDA SuperHeroes.

The challenge is simple to understand and in fact simple to get a first implementation up and running;  but winning will take plenty of CUDA skill as the challenge will exercise many CUDA optimization techniques.

The winners will be announced at the NVIDIA GPU Technology Conference at the end of September. See the contest details here.

NVIDIA Announces Performance Primitives (NVPP) Library

June 8th, 2009

NVIDIA NVPP is a library of functions for performing CUDA accelerated processing. The initial set of functionality in the library focuses on imaging and video processing and is widely applicable for developers in these areas. NVPP will evolve over time to encompass more of the compute heavy tasks in a variety of problem domains. The NVPP library is written to maximize flexibility, while maintaining high performance.

NVPP can be used in one of two ways:

  • A stand-alone library for adding GPU acceleration to an application with minimal effort. Using this route allows developers to add GPU acceleration to their applications in a matter of hours.
  • A cooperative library for interoperating with a developer’s GPU code efficiently.

Either route allows developers to harness the massive compute resources of NVIDIA GPUs, while simultaneously reducing development times. The NVPP API matches the Intel Performance Primitives (IPP) library API so that porting existing IPP code to the GPU is easy to do.  For more information and to sign up for access to the beta release of NVPP, visit the NVPP website.

GPU VSIPL Library

March 31st, 2009

GPU VSIPL is an implementation of Vector Signal Image Processing Library that targets Graphics Processing Units (GPUs) supporting NVIDIA’s CUDA platform. By leveraging processors capable of 900 GFLOP/s or more, your application may achieve considerable speedup without any specialized development for GPUs. The GPU VSIPL range-Doppler map application achieved a 75x speedup on the GPU simply by linking it with GPU VSIPL.

GPU VSIPL is currently released as a static library, and all releases are verified with the VSIPL Core Lite Test Suite.

GPU VSIPL was presented to the High Performance Embedded Computing Workshop 2008. Read the GPU VSIPL extended abstract [PDF].For more information, visit the GPU VSIPL Website.

High performance computing for deformable image registration: towards a new paradigm in adaptive radiotherapy

August 11th, 2008

This paper described an implementation of fast deformable image registration using GPUs and CUDA in radiation therapy. Using lung and prostate volumetric imaging, the GPU implementation is 40-66 times faster than a single-threaded CPU implementation and 25-41 times faster than a multithreaded implementation. The paradigm of GPU-based near-real-time deformable image registration opens up a host of clinical applications for medical imaging. ( High performance computing for deformable image registration: Towards a new paradigm in adaptive radiotherapy. (Sanjiv S. Samant, Junyi Xia, Pınar Muyan-Özçelik, John D. Owens. Medical physics, 2008.)

GRIP – A Rugged GPU Accelerated Image Processing System

April 23rd, 2008

Vision4ce launched a new line of General-purpose Rugged Image Processing (GRIP) products at the recent SPIE Defense and Security Symposium in Orlando from 18th-20th March 2008. The GRIP-Beta showed cutting edge GPGPU-based image processing demonstrations, analog and Gigabit Ethernet video streams and the robust functionality in the Gripworkx image processing framework. The Vision4ce team with GRIP now addresses numerous rugged embedded computing challenges with a cost effective, readily available rugged solution that might normally be served by more expensive and lengthy FPGA approaches. See www.vision4ce.com for more information.

GPGPU Based Image Segmentation Livewire Algorithm Implementation

April 1st, 2008

This thesis presents a GPU implementation of the Livewire algorithm. The algorithm is divided in three phases: Sobel or Laplacian filter convolution, image modeling as a grid graph and solving the non-negative weighted edges single-source shortest path problem. In order to calculate the shortest path, an adapted version of the delta-stepping algorithm was developed for GPUs, using CUDA. A critical result analysis shows that intense speedups are seen in image filtering algorithms. On the other hand, the wide use of dependent device memory look-ups has constrained delta-stepping algorithm from achieving higher performance than CPU implementation although a better performance is expected for wider graphs. Besides showing the viability of the Livewire algorithm implementation, this thesis makes available an open-source image segmentation GPU based application, which can be used as example for future GPU algorithm implementations at http://code.google.com/p/gpuwire/.

Modal Fourier wavefront reconstruction using GPUs

April 24th, 2007

This work approaches the fundamental problem of accelerating FFT computation by use of GPUs, in order to apply it to Adaptive Optics, the key for obtaining the maximum performance from projected ground-based eXtremely Large Telescopes. A method to efficiently adapt the FFT for the underlying architecture of GPUs is given. The authors derive a novel FFT method that alternates base-2 and base-4 decomposition of the bidimensional domain to take the most from Multiple Render Target extension as they elaborate a very unusual Pease 8-data “butterfly”. (Modal Fourier wavefront reconstruction using GPUs J.G. Marichal-Hernandez, J.M. Rodriguez-Ramos, F. Rosa. La Laguna University. To appear in Journal of Electronic Imaging.)

GPUCV: A free GPU-accelerated library for image processing and computer vision

April 2nd, 2007

GPUCV is a free GPU-accelerated library for image processing and computer vision. It offers an Intel OPENCV-like programming interface for easily porting existing applications. A one-page description is available. A longer presentation and discussion was published at IEEE ICME 2006. (J.-P. Farrugia, P. Horain, E. Guehenneux, Y. Allusse, “GPUCV: A framework for image processing acceleration with graphics processors”, CDROM proc. of the IEEE International Conference on Multimedia & Expo, July 9-12, 2006, Toronto, Ontario, Canada.)

Interactive Depth of Field Using Simulated Diffusion on a GPU

January 18th, 2007

This Pixar Animation Studios Technical Report by Kass, Lefohn, and Owens describes a GPU-based data-parallel direct tridiagonal linear solver. To the authors’ knowledge, this is the first reported direct, linear-time tridiagonal GPU solver. The solver is used to implement a new heat-diffusion-based depth-of-field preview algorithm; and the paper describes solving thousands of tridiagonal systems, each with hundreds of elements, on the GPU at interactive rendering rates. The alternating direction implicit solution gives rise to separable spatially varying recursive (infinite-impulse response, IIR) filters that can compute large-kernel convolutions in constant time per pixel while respecting the boundaries between in-focus and out-of-focus objects. Recursive filters have traditionally been viewed as problematic for GPUs, but using the well-established method of cyclic reduction of tridiagonal systems, the authors are able to parallelize the computation and implement an efficient solution in terms of GPGPU primitives. (Michael Kass, Aaron Lefohn, and John Owens. Interactive Depth of Field Using Simulated Diffusion on the GPU, Technical Report #06-01, Pixar Animation Studios, January 2006.)

Hardware Efficient PDE Solvers in Quantized Image Processing

March 21st, 2005

This thesis by Robert Strzodka describes the design of robust quantized schemes and their hardware efficient implementation on data-stream-based architectures for PDE-based image processing. The focus lies on enhancing both performance and accuracy by an efficient use of appropriate hardware resources. Quantized schemes which, despite roundoff errors, preserve the qualitative behavior of the continuous models are constructed, and examined on different GPUs, a FPGA and a reconfigurable array processor. The pros and cons of the hardware designs and the memory gap problem are discussed in detail. (Hardware Efficient PDE Solvers in Quantized Image Processing. Robert Strzodka. PhD thesis, University of Duisburg-Essen, 2004.)

Page 1 of 212