Many image processing applications use the histogramming algorithm, which fills a set of bins according to the frequency of occurrence of pixel values taken from an input image. Histogramming has been mapped on a GPU prior to this work. Although significant research effort has been spent in optimizing the mapping, we show that the performance and performance predictability of existing methods can still be improved.
In this paper, we present two novel histogramming methods, both achieving a higher performance and predictability than existing methods. We discuss performance limitations for both novel methods by exploring algorithm trade-offs.
The first novel method gives an average performance increase of 33% over existing methods for non-synthetic benchmarks. The second novel method gives an average performance increase of 56% over existing methods and guarantees to be fully data independent. While the second method is specifically designed for Fermi GPU architectures, the first method is also suitable for older architectures.
(Cedric Nugteren, Gert-Jan van den Braak, Henk Corporaal, Bart Mesman: “High performance predictable histogramming on GPUs: exploring and evaluating algorithm trade-offs”, GPGPU-4: Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units. [DOI] [Paper and Source Code])
Although trivial background subtraction (BGS) algorithms (e.g. frame differencing, running average…) can perform quite fast, they are not robust enough to be used in various computer vision problems. Some complex algorithms usually give better results, but are too slow to be applied to real-time systems. We propose an improved version of the Extended Gaussian mixture model that utilizes the computational power of Graphics Processing Units (GPUs) to achieve real-time performance. Experiments show that our implementation running on a low-end GeForce 9600GT GPU provides at least 10x speedup. The frame rate is greater than 50 frames per second (fps) for most of the tests, even on HD video formats.
(Vu Pham, Phong Vo, Vu Thanh Hung and Le Hoai Bac: “GPU Implementation of Extended Gaussian Mixture Model for Background Subtraction”. IEEE International Conference on Computing and Communication Technologies, Research, Innovation, and Vision for the Future (RIVF), 2010. [DOI] [code and additional information])
The Euler-Lagrange (EL) framework is the most widely-used strategy for solving variational optic flow methods. We present the first approach that solves the EL equations of state-of-the-art methods on sequences with 640×480 pixels in near-realtime on GPUs. This performance is achieved by combining two ideas: (i) We extend the recently proposed Fast Explicit Diffusion (FED) scheme to optic flow, and additionally embed it into a coarse-to-fine strategy. (ii) We parallelise our complete algorithm on a GPU, where a careful optimisation of global memory operations and an efficient use of on-chip memory guarantee a good performance. Applying our approach to the variational ‘Complementary Optic Flow’ method (Zimmer et al. (2009)), we obtain highly accurate flow fields in less than a second. This currently constitutes the fastest method in the top 10 of the widely used Middlebury benchmark.
TunaCode has announced the release of CUVI Lib v0.3 (Beta version) for Windows 32 and 64 Systems. A copy can be downloaded from http://www.cuvilib.com/downloads.
CUVI Lib (CUDA for Vision and Imaging Lib) is an add-on library for NPP (NVIDIA Performance Primitives) and includes several advanced computer vision and image processing functions presently not available in NPP. This version of CUVI Lib supports, among others:
Taking inspiration from genetic screening techniques, researchers from MIT and Harvard have demonstrated a way to build better artificial visual systems with the help of low-cost, high-performance gaming hardware.
The neural processing involved in visually recognizing even the simplest object in a natural environment is profound — and profoundly difficult to mimic. Neuroscientists have made broad advances in understanding the visual system, but much of the inner workings of biologically based systems remain a mystery.
Using Graphics Processing Units (GPUs) — the same technology video game designers use to render life-like graphics — MIT and Harvard researchers are now making progress faster than ever before. “We made a powerful computing system that delivers over hundred fold speed-ups relative to conventional methods,” said Nicolas Pinto, a PhD candidate in James DiCarlo’s lab at the McGovern Institute for Brain Research at MIT. “With this extra computational power, we can discover new vision models that traditional methods miss.” Pinto co-authored the PLoS study with David Cox of the Visual Neuroscience Group at the Rowland Institute at Harvard.
This paper by Wojek et al. presents a fast object class localization framework from TU Darmstadt implemented on a data parallel architecture currently available in recent computers. Our case study, the implementation of Histograms of Oriented Gradients (HOG) descriptors, shows that just by using this recent programming model we can easily speed up an original CPU-only implementation by a factor of 34 (with disk IO) / 109 (processing only), making it unnecessary to use early rejection cascades that sacrifice classification performance, even in real-time conditions. Using recent techniques to program the Graphics Processing Unit (GPU) allows our method to scale up to the latest, as well as to future improvements of the hardware.(Sliding-Windows for Rapid Object Class Localization: a Parallel Technique. C. Wojek, G. Dorko, A. Schulz, B. Schiele.30th DAGM Symposium (DAGM 2008), pp. 71-81, Munich, Germany)
GPU4Vision is a project founded by the Institute for Computer Graphics and Vision, Graz University of Technology dealing with fast computer vision algorithms for tasks like basic image processing, segmentation, motion, stereo etc. On the GPU4Vision website you can take a look at the project’s latest scientific publications, watch demo videos of algorithms and even download and evaluate some of them on your own PC. (GPU4Vision – Website)
This work describes the implementation of a real-time visual tracker that targets the position and 3D pose of objects (specifically faces) in video sequences. The use of GPUs for the computation and efficient sparse-template-based particle filtering allows real-time processing even when tracking multiple faces simultaneously in high-resolution video frames. Using a GPU and the NVIDIA CUDA technology, performance improvements as large as ten times compared to a similar CPU-only tracker are achieved. (Real-time Visual Tracker by Stream Processing. Oscar Mateo Lozano, and Kazuhiro Otsuka. Journal of Signal Processing Systems.)
This paper by Cabido et al. presents a real-time object tracking algorithm, based on the hybridization of particle filtering (PF) and a multi-scale local search (MSLS) algorithm, for both CPU and GPU architectures. The developed system provides successful results in precise tracking of single and multiple targets in monocular video, operating in real-time at 70 frames per second for 640 × 480 video resolutions on the GPU, up to 1100% faster than the CPU version of the algorithm. (Multiscale and local search methods for real time region tracking with particle filters: local search driven by adaptive scale estimation on GPUs. Raul Cabido, Antonio S. Montemayor, Juan Jose Pantrigo, and Bryson R. Payne. Machine Vision and Applications, Springer, 2008.)