Recently, general-purpose computing on graphics processing units (GPGPU) has been enabled on mobile devices thanks to the emerging heterogeneous programming models such as OpenCL. The capability of GPGPU on mobile devices opens a new era for mobile computing and can enable many computationally demanding computer vision algorithms on mobile devices. As a case study, this paper proposes to accelerate an exemplar-based inpainting algorithm for object removal on a mobile GPU using OpenCL. We discuss the methodology of exploring the parallelism in the algorithm as well as several optimization techniques. Experimental results demonstrate that our optimization strategies for mobile GPUs have significantly reduced the processing time and make computationally intensive computer vision algorithms feasible for a mobile device. To the best of the authors’ knowledge, this work is the first published implementation of general-purpose computing using OpenCL on mobile GPUs.
(Guohui Wang, Yingen Xiong, Jay Yun and Joseph R. Cavallaro: “Accelerating Computer Vision Algorithms Using OpenCL on the Mobile GPU – A Case Study”, International Conference on Acoustics, Speech, and Signal Processing (ICASSP)}, May 2013, to appear. [PDF])
TunaCode has released CUVILib v1.2, a library to accelerate imaging and computer vision applications. CUVILib adds acceleration to Imaging applications from Medical, Industrial and Defense domains. It delivers very high performance and supports both CUDA and OpenCL. Modules include color operations (demosaic, conversions, correction etc), linear/non-linear filtering, feature extraction & tracking, motion estimation, image transforms and image statistics.
Modern GPUs are well suited for performing image processing tasks. We utilize their high computational performance and memory bandwidth for image segmentation purposes. We segment cardiac MRI data by means of numerical solution of an anisotropic partial differential equation of the Allen-Cahn type. We implement two different algorithms for solving the equation on the CUDA architecture. One of them is based on the Runge-Kutta-Merson method for the approximation of solutions of ordinary differential equations, the other uses the GMRES method for the numerical solution of systems of linear equations. In our experiments, the CUDA implementations of both algorithms are about 3–9 times faster than corresponding 12-threaded OpenMP implementations.
(Oberhuber T., Suzuki A., Vacata J., Žabka V., “Image segmentation using CUDA implementations of the Runge-Kutta-Merson and GMRES methods“, Journal of Math-for-Industry, 2011, vol. 3, pp. 73–79 [PDF])
Many image processing applications use the histogramming algorithm, which fills a set of bins according to the frequency of occurrence of pixel values taken from an input image. Histogramming has been mapped on a GPU prior to this work. Although significant research effort has been spent in optimizing the mapping, we show that the performance and performance predictability of existing methods can still be improved.
In this paper, we present two novel histogramming methods, both achieving a higher performance and predictability than existing methods. We discuss performance limitations for both novel methods by exploring algorithm trade-offs.
The first novel method gives an average performance increase of 33% over existing methods for non-synthetic benchmarks. The second novel method gives an average performance increase of 56% over existing methods and guarantees to be fully data independent. While the second method is specifically designed for Fermi GPU architectures, the first method is also suitable for older architectures.
(Cedric Nugteren, Gert-Jan van den Braak, Henk Corporaal, Bart Mesman: “High performance predictable histogramming on GPUs: exploring and evaluating algorithm trade-offs”, GPGPU-4: Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units. [DOI] [Paper and Source Code])
Although trivial background subtraction (BGS) algorithms (e.g. frame differencing, running average…) can perform quite fast, they are not robust enough to be used in various computer vision problems. Some complex algorithms usually give better results, but are too slow to be applied to real-time systems. We propose an improved version of the Extended Gaussian mixture model that utilizes the computational power of Graphics Processing Units (GPUs) to achieve real-time performance. Experiments show that our implementation running on a low-end GeForce 9600GT GPU provides at least 10x speedup. The frame rate is greater than 50 frames per second (fps) for most of the tests, even on HD video formats.
(Vu Pham, Phong Vo, Vu Thanh Hung and Le Hoai Bac: “GPU Implementation of Extended Gaussian Mixture Model for Background Subtraction”. IEEE International Conference on Computing and Communication Technologies, Research, Innovation, and Vision for the Future (RIVF), 2010. [DOI] [code and additional information])
The Euler-Lagrange (EL) framework is the most widely-used strategy for solving variational optic flow methods. We present the first approach that solves the EL equations of state-of-the-art methods on sequences with 640×480 pixels in near-realtime on GPUs. This performance is achieved by combining two ideas: (i) We extend the recently proposed Fast Explicit Diffusion (FED) scheme to optic flow, and additionally embed it into a coarse-to-fine strategy. (ii) We parallelise our complete algorithm on a GPU, where a careful optimisation of global memory operations and an efficient use of on-chip memory guarantee a good performance. Applying our approach to the variational ‘Complementary Optic Flow’ method (Zimmer et al. (2009)), we obtain highly accurate flow fields in less than a second. This currently constitutes the fastest method in the top 10 of the widely used Middlebury benchmark.
TunaCode has announced the release of CUVI Lib v0.3 (Beta version) for Windows 32 and 64 Systems. A copy can be downloaded from http://www.cuvilib.com/downloads.
CUVI Lib (CUDA for Vision and Imaging Lib) is an add-on library for NPP (NVIDIA Performance Primitives) and includes several advanced computer vision and image processing functions presently not available in NPP. This version of CUVI Lib supports, among others:
Taking inspiration from genetic screening techniques, researchers from MIT and Harvard have demonstrated a way to build better artificial visual systems with the help of low-cost, high-performance gaming hardware.
The neural processing involved in visually recognizing even the simplest object in a natural environment is profound — and profoundly difficult to mimic. Neuroscientists have made broad advances in understanding the visual system, but much of the inner workings of biologically based systems remain a mystery.
Using Graphics Processing Units (GPUs) — the same technology video game designers use to render life-like graphics — MIT and Harvard researchers are now making progress faster than ever before. “We made a powerful computing system that delivers over hundred fold speed-ups relative to conventional methods,” said Nicolas Pinto, a PhD candidate in James DiCarlo’s lab at the McGovern Institute for Brain Research at MIT. “With this extra computational power, we can discover new vision models that traditional methods miss.” Pinto co-authored the PLoS study with David Cox of the Visual Neuroscience Group at the Rowland Institute at Harvard.
This paper by Wojek et al. presents a fast object class localization framework from TU Darmstadt implemented on a data parallel architecture currently available in recent computers. Our case study, the implementation of Histograms of Oriented Gradients (HOG) descriptors, shows that just by using this recent programming model we can easily speed up an original CPU-only implementation by a factor of 34 (with disk IO) / 109 (processing only), making it unnecessary to use early rejection cascades that sacrifice classification performance, even in real-time conditions. Using recent techniques to program the Graphics Processing Unit (GPU) allows our method to scale up to the latest, as well as to future improvements of the hardware.(Sliding-Windows for Rapid Object Class Localization: a Parallel Technique. C. Wojek, G. Dorko, A. Schulz, B. Schiele.30th DAGM Symposium (DAGM 2008), pp. 71-81, Munich, Germany)