“GPU Algorithms for Image Processing and Computer Vision”, to be published by Springer, will contain a collection of articles on fundamental image processing and computer vision methods adapted for Graphics Processing Units (GPUs). In recent years, substantial efforts were undertaken to adapt many such algorithms for massively-parallel GPU-based systems. The book is envisioned as a consolidation of such work into a single volume covering widely used methods and techniques. Each chapter will be written by authors working on a specific group of methods. It will provide mathematical background, parallel algorithm, and implementation details leading to reusable, adaptable, and scalable code fragments. The book will serve as a GPU implementation manual for many image processing and analysis algorithms providing valuable insights into parallelization strategies for GPUs as well as ready-to-use code fragments with a broad appeal to both developers and researchers interested in GPU computing. Read the rest of this entry »
In this paper, we propose an efficient acceleration method for the nonrigid registration of multimodal images that uses a graphics processing unit (GPU). The key contribution of our method is efficient utilization of on-chip memory for both normalized mutual information (NMI) computation and hierarchical B-spline deformation, which compose a well-known registration algorithm. We implement this registration algorithm as a compute unified device architecture (CUDA) program with an efficient parallel scheme and several optimization techniques such as hierarchical data organization, data reuse, and multiresolution representation. We experimentally evaluate our method with four clinical datasets consisting of up to 512x512x296 voxels. We find that exploitation of onchip memory achieves a 12-fold increase in speed over an off-chip memory version and, therefore, it increases the efficiency of parallel execution from 4% to 46%. We also find that our method running on a GeForce GTX 580 card is approximately 14 times faster than a fully optimized CPU-based implementation running on four cores. Some multimodal registration results are also provided to understand the limitation of our method. We believe that our highly efficient method, which completes an alignment task within a few tens of second, will be useful to realize rapid nonrigid registration.
(Kei Ikeda, Fumihiko Ino, and Kenichi Hagihara: “Efficient Acceleration of Mutual Information Computation for Nonrigid Registration Using CUDA”. Accepted for publication in the IEEE Journal of Biomedical and Health Informatics. [DOI])
OpenCLIPP is a library providing processing primitives (image processing primitives in the first version) implemented with OpenCL for fast execution on dedicated computing devices like GPUs. Two interfaces are provided: C (similar to the Intel IPP and NVIDIA NPP libraries) and C++. OpenCLIPP is free for personal and commercial use. It can be downloaded from GitHub.
M. Akhloufi, A. Campagna, “OpenCLIPP: OpenCL Integrated Performance Primitives library for computer vision applications”, Proc. SPIE Electronic Imaging 2014, Intelligent Robots and Computer Vision XXXI: Algorithms and Techniques, P. 9025-31, February 2014.
A free webinar on accelerating face-in-the-crowd recognition with GPU technology will be held on November 5th. It teaches how GPUs can be used to accelerate face detection and recognition of people in the crowd. The presentation will also cover the speakers’ use of ROS, OpenCV, OpenMP, and Armadillo libraries to develop fast reliable distributed video processing code. To register follow the link: https://www2.gotomeeting.com/register/292953058
This paper presents an accelerated version of copy-move image forgery detection scheme on the Graphics Processing Units or GPUs. With the replacement of analog cameras with their digital counterparts and availability of powerful image processing software packages, authentication of digital images has gained importance in the recent past. This paper focuses on improving the performance of a copy-move forgery detection scheme based on radix sort by porting it onto the GPUs. This scheme has enhanced performance and is much more efficient compared to other methods without degradation of detection results. The CPU version of the radix-sort based detection scheme was developed in Matlab and critical sections of the CPU version were coded in C-language using Matlab’s Mex interface to get the maximum performance. The GPU version was developed using Jacket GPU Engine for Matlab and performs over twelve times faster than its optimized CPU variant. The contribution this paper makes towards blind image forensics is the use of integral images for computing feature vectors of overlapping blocks in block-matching technique and acceleration of the entire copy-move forgery detection scheme on the GPUs, not found in literature.
(Jaideep Singh and Balasubramanian Raman, “A High Performance Copy-Move Image Forgery Detection Scheme on GPU”, Advances in Intelligent and Soft Computing Volume 131, 2012, pp 239-246, Proceedings of the International Conference on Soft Computing for Problem Solving (SocProS 2011). [DOI])
Anatoly Baksheev, OpenCV GPU Module Team Leader at Itseez will demonstrate how to obtain and build OpenCV, its GPU module, and the sample programs. You will learn how to use the OpenCV GPU module and create your own custom GPU functions for OpenCV. Register for the July 30th webinar: http://goo.gl/5V3eA
From a recent press release:
AMD’s APP SDK is an essential resource for developers who wish to leverage the processing power of heterogeneous computing. OpenCL™ is the primary mechanism for achieving this today, but AMD’s goal is to enable developers to accelerate applications with the programming paradigm of their choice. Toward that end, AMD has added support for heterogeneous libraries such as the newly released Bolt open source C++ template library and OpenCV computer vision library which now includes heterogeneous acceleration.
New to APP SDK 2.8.1:
Bolt: With the recent launch of Bolt 1.0, AMD has added several samples to the APP SDK to demonstrate Bolt 1.0 features. These showcase the usage of Bolt APIs such as scan, sort, reduce and transform. Other new samples highlight the ease of porting from STL and the performance benefits achieved over equivalent STL implementations. We’ve also included samples to demonstrate the different fallback options available in Bolt 1.0 when no GPU is available which ensure your code runs correctly on any platform.
OpenCV: AMD has been working closely with the OpenCV open source community to add heterogeneous acceleration capability to the world’s most popular computer vision library. These changes are already integrated into OpenCV and are readily available for developers who want to improve performance and efficiency of their computer vision applications. AMD has included samples to illustrate these improvements and highlight how simple it is to include them in your app.
GCN: AMD recently launched its new Graphics Core Next (GCN) architecture on several AMD products. GCN is based on a scalar architecture vs. the VLIW vector architecture of prior generations, so hand-tuned vectorization to optimize hardware utilization is no longer needed. We’ve modified several samples in AMD APP SDK 2.8.1 to show the ease of writing scalar code as compared to vectorization.
For more information, see developer.amd.com.
This 1-hour webinar (June 11, 10am-11am PST) introduces the powerful OpenCV library, shows how this library has been accelerated using CUDA on NVIDIA GPUs, and demonstrates how to use the OpenCV GPU library to create lightning-fast applications. Free registration: http://bit.ly/11eqoaJ
Recently, general-purpose computing on graphics processing units (GPGPU) has been enabled on mobile devices thanks to the emerging heterogeneous programming models such as OpenCL. The capability of GPGPU on mobile devices opens a new era for mobile computing and can enable many computationally demanding computer vision algorithms on mobile devices. As a case study, this paper proposes to accelerate an exemplar-based inpainting algorithm for object removal on a mobile GPU using OpenCL. We discuss the methodology of exploring the parallelism in the algorithm as well as several optimization techniques. Experimental results demonstrate that our optimization strategies for mobile GPUs have significantly reduced the processing time and make computationally intensive computer vision algorithms feasible for a mobile device. To the best of the authors’ knowledge, this work is the first published implementation of general-purpose computing using OpenCL on mobile GPUs.
(Guohui Wang, Yingen Xiong, Jay Yun and Joseph R. Cavallaro: “Accelerating Computer Vision Algorithms Using OpenCL on the Mobile GPU – A Case Study”, International Conference on Acoustics, Speech, and Signal Processing (ICASSP)}, May 2013, to appear. [PDF])
TunaCode has released CUVILib v1.2, a library to accelerate imaging and computer vision applications. CUVILib adds acceleration to Imaging applications from Medical, Industrial and Defense domains. It delivers very high performance and supports both CUDA and OpenCL. Modules include color operations (demosaic, conversions, correction etc), linear/non-linear filtering, feature extraction & tracking, motion estimation, image transforms and image statistics.
More information, including a free trial version: http://www.cuvilib.com/