October 7th, 2013
September 22nd, 2013
This paper presents an accelerated version of copy-move image forgery detection scheme on the Graphics Processing Units or GPUs. With the replacement of analog cameras with their digital counterparts and availability of powerful image processing software packages, authentication of digital images has gained importance in the recent past. This paper focuses on improving the performance of a copy-move forgery detection scheme based on radix sort by porting it onto the GPUs. This scheme has enhanced performance and is much more efficient compared to other methods without degradation of detection results. The CPU version of the radix-sort based detection scheme was developed in Matlab and critical sections of the CPU version were coded in C-language using Matlab’s Mex interface to get the maximum performance. The GPU version was developed using Jacket GPU Engine for Matlab and performs over twelve times faster than its optimized CPU variant. The contribution this paper makes towards blind image forensics is the use of integral images for computing feature vectors of overlapping blocks in block-matching technique and acceleration of the entire copy-move forgery detection scheme on the GPUs, not found in literature.
(Jaideep Singh and Balasubramanian Raman, “A High Performance Copy-Move Image Forgery Detection Scheme on GPU”, Advances in Intelligent and Soft Computing Volume 131, 2012, pp 239-246, Proceedings of the International Conference on Soft Computing for Problem Solving (SocProS 2011). [DOI])
September 22nd, 2013
Feature detection and extraction are essential in computer vision applications such as image matching and object recognition. The Scale-Invariant Feature Transform (SIFT) algorithm is one of the most robust approaches to detect and extract distinctive invariant features from images. However, high computational complexity makes it difficult to apply the SIFT algorithm to mobile applications. Recent developments in mobile processors have enabled heterogeneous computing on mobile devices, such as smartphones and tablets. In this paper, we present an OpenCL-based implementation of the SIFT algorithm on a smartphone, taking advantage of the mobile GPU. We carefully analyze the SIFT workloads and identify the parallelism. We implemented major steps of the SIFT algorithm using both serial C++ code and OpenCL kernels targeting mobile processors, to compare the performance of different workflows. Based on the profiling results, we partition the SIFT algorithm between the CPU and GPU in a way that best exploits the parallelism and minimizes the buffer transferring time to achieve better performance. The experimental results show that we are able to achieve 8.5 FPS for keypoints detection and 19 FPS for descriptor generation without reducing the number and the quality of the keypoints. Moreover, the heterogeneous implementation can reduce energy consumption by 41% compared to an optimized CPU-only implementation.
(Guohui Wang, Blaine Rister, and Joseph R. Cavallaro: “Workload Analysis and Efficient OpenCL-based Implementation of SIFT Algorithm on a Smartphone”, 1st IEEE Global Conference on Signal and Information Processing (GlobalSIP), Dec. 2013, [PDF])
March 13th, 2013
Fastvideo have released their JPEG codec for NVIDIA GPUs. Peak performance of the codec reaches 6 GBytes per second and higher for images loadedfrom host RAM. For instance, a full-color 4K image with resolution 3840 x 2160 can be compressed by 10 times in merely 6 milliseconds on NVIDIA GeForce GTX Titan. More information: http://www.fastcompression.com
February 10th, 2013
The GPU Debayer software developed by Fastvideo can be used for demosaicing of raw 8-bit Bayer images to full-color 24-bit RGB format. The application employs the HQLI and DFPD algorithms and is tuned for NVIDIA GPUs, which results in very fast conversion, e.g., only 1.25 ms for Full HD image demosaicing on GeForce GTX 580. The software is freely available.
May 17th, 2012
This class teaches the fundamentals of parallel computing with the GPU and the CUDA programming environment. Examples are based on a series of image processing algorithms, such as those in Photoshop or Instagram. Programming and running assignments on high-end GPUs is possible, even if you don’t own one yourself. The course started Monday 4th Feb 2013 so there is still time to join. More information and enrollment: https://www.udacity.com/course/cs344.
March 18th, 2012
TunaCode has released CUVILib v1.2, a library to accelerate imaging and computer vision applications. CUVILib adds acceleration to Imaging applications from Medical, Industrial and Defense domains. It delivers very high performance and supports both CUDA and OpenCL. Modules include color operations (demosaic, conversions, correction etc), linear/non-linear filtering, feature extraction & tracking, motion estimation, image transforms and image statistics.
More information, including a free trial version: http://www.cuvilib.com/
August 29th, 2011
Modern GPUs are well suited for performing image processing tasks. We utilize their high computational performance and memory bandwidth for image segmentation purposes. We segment cardiac MRI data by means of numerical solution of an anisotropic partial differential equation of the Allen-Cahn type. We implement two different algorithms for solving the equation on the CUDA architecture. One of them is based on the Runge-Kutta-Merson method for the approximation of solutions of ordinary differential equations, the other uses the GMRES method for the numerical solution of systems of linear equations. In our experiments, the CUDA implementations of both algorithms are about 3–9 times faster than corresponding 12-threaded OpenMP implementations.
(Oberhuber T., Suzuki A., Vacata J., Žabka V., “Image segmentation using CUDA implementations of the Runge-Kutta-Merson and GMRES methods“, Journal of Math-for-Industry, 2011, vol. 3, pp. 73–79 [PDF])
July 24th, 2011
The Hough transform is a commonly used algorithm to detect lines and other features in images. It is robust to noise and occlusion, but has a large computational cost. This paper introduces two new implementations of the Hough transform for lines on a GPU. One focuses on minimizing processing time, while the other has an input-data independent processing time. Our results show that optimizing the GPU code for speed can achieve a speed-up over naive GPU code of about 10x. The implementation which focuses on processing speed is the faster one for most images, but the implementation which achieves a constant processing time is quicker for about 20% of the images.
(Gert-Jan van den Braak, Cedric Nugteren, Bart Mesman and Henk Corporaal: “Fast Hough Transform on GPUs: Exploration of Algorithm Trade-offs”. In: Advanced Concepts for Intelligent Vision Systems, Lecture Notes in Computer Science, Vol. 6915, pp.611-622, 2011. [DOI])
July 17th, 2011
TunaCode is pleased to announce the release of CUVI (CUDA Vision and Imaging Library) version 0.5 which comes with a new API and new features. This release makes it even simpler to add acceleration to existing Imaging applications, without any prior technical knowledge of GPUs. CUVI v0.5 is built from bottom up with performance and ease-of-use in mind.
CUVI version 0.5 is available for download at http://cuvilib.com and is available for Windows (Win32, x64) with planned support for Linux and Mac.
Functional magnetic resonance imaging (fMRI) makes it possible to non-invasively measure brain activity with high spatial resolution. There are however a number of issues that have to be addressed. One is the large amount of spatio-temporal data that needs to be processed. In addition to the statistical analysis itself, several preprocessing steps, such as slice timing correction and motion compensation, are normally applied. The high computational power of modern graphic cards has already successfully been used for MRI and fMRI. Going beyond the ﬁrst published demonstration of GPU-based analysis of fMRI data, all the preprocessing steps and two statistical approaches, the general linear model (GLM) and canonical correlation analysis (CCA), have been implemented on a GPU. For an fMRI dataset of typical size (80 volumes with 64 x 64 x 22 voxels), all the preprocessing takes about 0.5 s on the GPU, compared to 5 s with an optimized CPU implementation and 120 s with the commonly used statistical parametric mapping (SPM) software. A random permutation test with 10 000 permutations, with smoothing in each permutation, takes about 50 s if three GPUs are used, compared to 0.5 – 2.5 h with an optimized CPU implementation. The presented work will save time for researchers and clinicians in their daily work and enables the use of more advanced analysis, such as non-parametric statistics, both for conventional fMRI and for real-time fMRI.
(Anders Eklund, Mats Andersson, Hans Knutsson: “fMRI Analysis on the GPU – Possibilities and Challenges”, Computer Methods and Programs in Biomedicine, 2011 [DOI])