High performance computing for deformable image registration: towards a new paradigm in adaptive radiotherapy

August 11th, 2008

This paper described an implementation of fast deformable image registration using GPUs and CUDA in radiation therapy. Using lung and prostate volumetric imaging, the GPU implementation is 40-66 times faster than a single-threaded CPU implementation and 25-41 times faster than a multithreaded implementation. The paradigm of GPU-based near-real-time deformable image registration opens up a host of clinical applications for medical imaging. ( High performance computing for deformable image registration: Towards a new paradigm in adaptive radiotherapy. (Sanjiv S. Samant, Junyi Xia, Pınar Muyan-Özçelik, John D. Owens. Medical physics, 2008.)

GRIP – A Rugged GPU Accelerated Image Processing System

April 23rd, 2008

Vision4ce launched a new line of General-purpose Rugged Image Processing (GRIP) products at the recent SPIE Defense and Security Symposium in Orlando from 18th-20th March 2008. The GRIP-Beta showed cutting edge GPGPU-based image processing demonstrations, analog and Gigabit Ethernet video streams and the robust functionality in the Gripworkx image processing framework. The Vision4ce team with GRIP now addresses numerous rugged embedded computing challenges with a cost effective, readily available rugged solution that might normally be served by more expensive and lengthy FPGA approaches. See www.vision4ce.com for more information.

GPGPU Based Image Segmentation Livewire Algorithm Implementation

April 1st, 2008

This thesis presents a GPU implementation of the Livewire algorithm. The algorithm is divided in three phases: Sobel or Laplacian filter convolution, image modeling as a grid graph and solving the non-negative weighted edges single-source shortest path problem. In order to calculate the shortest path, an adapted version of the delta-stepping algorithm was developed for GPUs, using CUDA. A critical result analysis shows that intense speedups are seen in image filtering algorithms. On the other hand, the wide use of dependent device memory look-ups has constrained delta-stepping algorithm from achieving higher performance than CPU implementation although a better performance is expected for wider graphs. Besides showing the viability of the Livewire algorithm implementation, this thesis makes available an open-source image segmentation GPU based application, which can be used as example for future GPU algorithm implementations at http://code.google.com/p/gpuwire/.

Modal Fourier wavefront reconstruction using GPUs

April 24th, 2007

This work approaches the fundamental problem of accelerating FFT computation by use of GPUs, in order to apply it to Adaptive Optics, the key for obtaining the maximum performance from projected ground-based eXtremely Large Telescopes. A method to efficiently adapt the FFT for the underlying architecture of GPUs is given. The authors derive a novel FFT method that alternates base-2 and base-4 decomposition of the bidimensional domain to take the most from Multiple Render Target extension as they elaborate a very unusual Pease 8-data “butterfly”. (Modal Fourier wavefront reconstruction using GPUs J.G. Marichal-Hernandez, J.M. Rodriguez-Ramos, F. Rosa. La Laguna University. To appear in Journal of Electronic Imaging.)

GPUCV: A free GPU-accelerated library for image processing and computer vision

April 2nd, 2007

GPUCV is a free GPU-accelerated library for image processing and computer vision. It offers an Intel OPENCV-like programming interface for easily porting existing applications. A one-page description is available. A longer presentation and discussion was published at IEEE ICME 2006. (J.-P. Farrugia, P. Horain, E. Guehenneux, Y. Allusse, “GPUCV: A framework for image processing acceleration with graphics processors”, CDROM proc. of the IEEE International Conference on Multimedia & Expo, July 9-12, 2006, Toronto, Ontario, Canada.)

Interactive Depth of Field Using Simulated Diffusion on a GPU

January 18th, 2007

This Pixar Animation Studios Technical Report by Kass, Lefohn, and Owens describes a GPU-based data-parallel direct tridiagonal linear solver. To the authors’ knowledge, this is the first reported direct, linear-time tridiagonal GPU solver. The solver is used to implement a new heat-diffusion-based depth-of-field preview algorithm; and the paper describes solving thousands of tridiagonal systems, each with hundreds of elements, on the GPU at interactive rendering rates. The alternating direction implicit solution gives rise to separable spatially varying recursive (infinite-impulse response, IIR) filters that can compute large-kernel convolutions in constant time per pixel while respecting the boundaries between in-focus and out-of-focus objects. Recursive filters have traditionally been viewed as problematic for GPUs, but using the well-established method of cyclic reduction of tridiagonal systems, the authors are able to parallelize the computation and implement an efficient solution in terms of GPGPU primitives. (Michael Kass, Aaron Lefohn, and John Owens. Interactive Depth of Field Using Simulated Diffusion on the GPU, Technical Report #06-01, Pixar Animation Studios, January 2006.)

Hardware Efficient PDE Solvers in Quantized Image Processing

March 21st, 2005

This thesis by Robert Strzodka describes the design of robust quantized schemes and their hardware efficient implementation on data-stream-based architectures for PDE-based image processing. The focus lies on enhancing both performance and accuracy by an efficient use of appropriate hardware resources. Quantized schemes which, despite roundoff errors, preserve the qualitative behavior of the continuous models are constructed, and examined on different GPUs, a FPGA and a reconfigurable array processor. The pros and cons of the hardware designs and the memory gap problem are discussed in detail. (Hardware Efficient PDE Solvers in Quantized Image Processing. Robert Strzodka. PhD thesis, University of Duisburg-Essen, 2004.)

Image Registration by a Regularized Gradient Flow

January 5th, 2005

To correlate the intensities in two images an energy functional is successively minimized in a variational setting. The gradient flow formulation makes use of a robust multi-scale regularization, an efficient multi-grid solver and an adaptive time-step control. On the GPU the multi-scale maps to a packed multi-grid pyramid with several scales per grid level. The algorithm uses three nested loops: the regularized multi-scale descent, the iterative solution of the gradient flow PDE, and on the third level the multi-grid smoother and the adaptive time-step iteration. (Image Registration by a Regularized Gradient Flow – A Streaming Implementation in DX9 Graphics Hardware. Robert Strzodka, Marc Droske and Martin Rumpf Computing, 73(4), 373-389, Springer, 2004.)

Accelerating Morphological Analysis with Graphics Hardware

February 19th, 2004

This paper from the VIS Group Stuttgart describes the acceleration of so-called morphological operators using graphics hardware and OpenGL. As the problem is mainly memory bandwidth bound, a solution based on graphics hardware can significantly reduce computation time in the filtering step, as graphics hardware typically has much broader and faster memory paths. When using fixed-point graphics hardware for mathematical computations, accuracy can be a problem. However, morphological operators map well onto the graphics pipeline, resulting in no loss of accuracy. See also the project page for more about hardware-based filtering. (Accelerating Morphological Analysis with Graphics Hardware. Matthias Hopf and Thomas Ertl. Workshop on Vision, Modelling, and Visualization 2000. pp 337-345)

Fast and Accurate Color Image Processing Using 3D Graphics Cards

September 11th, 2003

This paper by Colantoni et al. tests five color image processing algorithms (local mean filtering, RGB to L*a*b* and RGB to HSV color space conversions, local principal component analysis and anisotropic diffusion filtering). The proposed algorithms have been implemented twice. While a first implementation makes use of the 3D card (NV30 GPU with Cg), another is optimized for the CPU (P4 or Athlon). This allows comparison and analysis of the performance obtained from each version. (Fast and Accurate Color Image Processing Using 3D Graphics Cards. Philippe Colantoni, Nabil Boukala, and Jérôme da Rugna. To appear in the 8th International Fall Workshop on Vision, Modeling and Visualization (VMV 2003))

Page 3 of 41234