This meeting is organized by Toby Breckon & Stuart Barnes (Cranfield University) and the British Machine Vision Association and Society for Pattern Recognition. It will be held in London, UK, on 18 May 2011. The CfP poster is available at http://www.cranfield.ac.uk/~toby.breckon/events/bmva_symp_gpu11.pdf.
Read the rest of this entry »
Call for Participation: Using GPUs for Vision
January 23rd, 2011GPU Implementation of Extended Gaussian Mixture Model for Background Subtraction
January 12th, 2011Abstract:
Although trivial background subtraction (BGS) algorithms (e.g. frame differencing, running average…) can perform quite fast, they are not robust enough to be used in various computer vision problems. Some complex algorithms usually give better results, but are too slow to be applied to real-time systems. We propose an improved version of the Extended Gaussian mixture model that utilizes the computational power of Graphics Processing Units (GPUs) to achieve real-time performance. Experiments show that our implementation running on a low-end GeForce 9600GT GPU provides at least 10x speedup. The frame rate is greater than 50 frames per second (fps) for most of the tests, even on HD video formats.
(Vu Pham, Phong Vo, Vu Thanh Hung and Le Hoai Bac: “GPU Implementation of Extended Gaussian Mixture Model for Background Subtraction”. IEEE International Conference on Computing and Communication Technologies, Research, Innovation, and Vision for the Future (RIVF), 2010. [DOI] [code and additional information])
CUVI Lib – CUDA for Vision and Imaging Library Launched
August 1st, 2010
TunaCode has announced the release of CUVI Lib v0.3 (Beta version) for Windows 32 and 64 Systems. A copy can be downloaded from http://www.cuvilib.com/downloads.
CUVI Lib (CUDA for Vision and Imaging Lib) is an add-on library for NPP (NVIDIA Performance Primitives) and includes several advanced computer vision and image processing functions presently not available in NPP. This version of CUVI Lib supports, among others:
- Optical Flow (Horn & Shunck)
- Optical Flow (Lucas & Kanade)
- Discrete Wavelet Transform (Forward and Inverse)
- Hough Transform
- Hough Lines (Lines Detector)
- Color Conversion (RGB-to-gray and RGBA-to-Gray)
Several more advanced features will be added to CUVI Lib in upcoming releases. A detailed function reference can be downloaded here. Forums to discuss feedback and further ideas are available.
Image Processing with CUDA Courses following the GTC
July 4th, 2010SagivTech plans to offer a 3-days course that deals with Image Processing with CUDA in the USA this September. This is an advanced course that is intended for experienced CUDA developers looking for optimization methods for image processing applications implemented on NVIDIA GPUs.
The course will be held in the San Francisco area, 9am to 5pm September 27-29.
“Believe it or Not! Multi-core CPUs Can Match GPU Performance for FLOP-intensive Application!”
May 30th, 2010Abstract:
In this work, we evaluate performance of a real-world image processing application that uses a cross-correlation algorithm to compare a given image with a reference one. The algorithm processes individual images represented as 2-dimensional matrices of single-precision floating-point values using O(n^4) operations involving dot-products and additions. We implement this algorithm on a nVidia GTX 285 GPU using CUDA, and also parallelize it for the Intel Xeon (Nehalem) and IBM Power7 processors, using both manual and automatic techniques. Pthreads and OpenMP with SSE and VSX vector intrinsics are used for the manually parallelized version, while a state-of-the-art optimization framework based on the polyhedral model is used for automatic compiler parallelization and optimization. The performance of this algorithm on the nVidia GPU suffers from: (1) a smaller shared memory, (2) unaligned device memory access patterns, (3) expensive atomic operations, and (4) weaker single-thread performance. On commodity multi-core processors, the application dataset is small enough to fit in caches, and when parallelized using a combination of task and short-vector data parallelism (via SSE/VSX) or through fully automatic optimization from the compiler, the application matches or beats the performance of the GPU version. The primary reasons for better multi-core performance include larger and faster caches, higher clock frequency, higher on-chip memory bandwidth, and better compiler optimization and support for parallelization. The best performing versions on the Power7, Nehalem, and GTX 285 run in 1.02s, 1.82s, and 1.75s, respectively. These results conclusively demonstrate that, under certain conditions, it is possible for a FLOP-intensive structured application running on a multi-core processor to match or even beat the performance of an equivalent GPU version.
(Rajesh Bordawekar and Uday Bondhugula and Ravi Rao: “Believe It or Not! Multi-core CPUs Can Match GPU Performance for FLOP-intensive Application!”. Technical Report RC24982, IBM Thomas J. Watson Research Center, Apr. 2010.)
CfP: SPIE Electronic Imaging 111: Parallel Processing for Imaging Applications
May 13th, 2010Imaging translates information into and out of the visual system with today’s computation engine of choice: digital electronic systems. While scalar architectures are no longer scaling at historical rates, we see a massive explosion in the total number of connected computation devices and the ways that hardware architectures and software parallel programming environments use these devices to work in concert and in parallel. From the computing cloud to map-reduce programming models and systems to multi-core CPUs to the regular layout of graphics processing units (GPUs) to the increasing capacity of FPGA fabrics, a range of parallel architectures and parallel programming environments are available to designers and researchers to solve computationally complex problems in efficient (and often real-time) imaging applications.
Accelerating MATLAB Image Processing Toolbox Functions on GPUs
March 23rd, 2010Abstract:
We present our effort in developing an open-source GPU (graphics processing units) code library for the MATLAB Image Processing Toolbox (IPT). We ported a dozen of representative functions from IPT and based on their inherent characteristics, we grouped these functions into four categories: data independent, data sharing, algorithm dependent and data dependent. For each category, we present a detailed case study, which reveals interesting insights on how to efficiently optimize the code for GPUs and highlight performance-critical hardware features, some of which have not been well explored in existing literature. Our results show drastic speedups for the functions in the data-independent or data-sharing category by leveraging hardware support judiciously; and moderate speedups for those in the algorithm-dependent category by careful algorithm selection and parallelization. For the functions in the last category, fine-grain synchronization and data-dependency requirements are the main obstacles to an efficient implementation on GPUs.
(J. Kong, et. al., “Accelerating MATLAB Image Processing Toolbox Functions on GPUs”, Proceedings of the Third Workshop on General-Purpose Computation on Graphics Processing Units (GPGPU-3), Pittsburgh, PA. Apr. 2010. Source code is available here.)
First TopCoder CUDA SuperHero Challenge Under Way
September 16th, 2009Today NVIDIA and TopCoder launched the first contest in the CUDA SuperHero Challenge.
The first contest challenges participants to develop the highest performing solution for GPU-Accelerated Connected Component Labeling of images. CCL is a simple but computationally intensive image processing operation that is used in many applications including machine vision, real-time object recognition, and security.
TopCoder is a large community of over 200,000 members of which over 32,000 have been active participants in the last 90 days. Anyone can register to be a TopCoder to participate in the CUDA SuperHero Challenge. Contestants around the world will be competing for some hard cash – and also the opportunity to be TopCoder’s first CUDA SuperHeroes.
The challenge is simple to understand and in fact simple to get a first implementation up and running; but winning will take plenty of CUDA skill as the challenge will exercise many CUDA optimization techniques.
The winners will be announced at the NVIDIA GPU Technology Conference at the end of September. See the contest details here.
NVIDIA Announces Performance Primitives (NVPP) Library
June 8th, 2009NVIDIA NVPP is a library of functions for performing CUDA accelerated processing. The initial set of functionality in the library focuses on imaging and video processing and is widely applicable for developers in these areas. NVPP will evolve over time to encompass more of the compute heavy tasks in a variety of problem domains. The NVPP library is written to maximize flexibility, while maintaining high performance.
NVPP can be used in one of two ways:
- A stand-alone library for adding GPU acceleration to an application with minimal effort. Using this route allows developers to add GPU acceleration to their applications in a matter of hours.
- A cooperative library for interoperating with a developer’s GPU code efficiently.
Either route allows developers to harness the massive compute resources of NVIDIA GPUs, while simultaneously reducing development times. The NVPP API matches the Intel Performance Primitives (IPP) library API so that porting existing IPP code to the GPU is easy to do. For more information and to sign up for access to the beta release of NVPP, visit the NVPP website.
GPU VSIPL Library
March 31st, 2009GPU VSIPL is an implementation of Vector Signal Image Processing Library that targets Graphics Processing Units (GPUs) supporting NVIDIA’s CUDA platform. By leveraging processors capable of 900 GFLOP/s or more, your application may achieve considerable speedup without any specialized development for GPUs. The GPU VSIPL range-Doppler map application achieved a 75x speedup on the GPU simply by linking it with GPU VSIPL.
GPU VSIPL is currently released as a static library, and all releases are verified with the VSIPL Core Lite Test Suite.
GPU VSIPL was presented to the High Performance Embedded Computing Workshop 2008. Read the GPU VSIPL extended abstract [PDF].For more information, visit the GPU VSIPL Website.