The GPU Debayer software developed by Fastvideo can be used for demosaicing of raw 8-bit Bayer images to full-color 24-bit RGB format. The application employs the HQLI and DFPD algorithms and is tuned for NVIDIA GPUs, which results in very fast conversion, e.g., only 1.25 ms for Full HD image demosaicing on GeForce GTX 580. The software is freely available.
This class teaches the fundamentals of parallel computing with the GPU and the CUDA programming environment. Examples are based on a series of image processing algorithms, such as those in Photoshop or Instagram. Programming and running assignments on high-end GPUs is possible, even if you don’t own one yourself. The course started Monday 4th Feb 2013 so there is still time to join. More information and enrollment: https://www.udacity.com/course/cs344.
TunaCode has released CUVILib v1.2, a library to accelerate imaging and computer vision applications. CUVILib adds acceleration to Imaging applications from Medical, Industrial and Defense domains. It delivers very high performance and supports both CUDA and OpenCL. Modules include color operations (demosaic, conversions, correction etc), linear/non-linear filtering, feature extraction & tracking, motion estimation, image transforms and image statistics.
More information, including a free trial version: http://www.cuvilib.com/
Modern GPUs are well suited for performing image processing tasks. We utilize their high computational performance and memory bandwidth for image segmentation purposes. We segment cardiac MRI data by means of numerical solution of an anisotropic partial differential equation of the Allen-Cahn type. We implement two different algorithms for solving the equation on the CUDA architecture. One of them is based on the Runge-Kutta-Merson method for the approximation of solutions of ordinary differential equations, the other uses the GMRES method for the numerical solution of systems of linear equations. In our experiments, the CUDA implementations of both algorithms are about 3–9 times faster than corresponding 12-threaded OpenMP implementations.
(Oberhuber T., Suzuki A., Vacata J., Žabka V., “Image segmentation using CUDA implementations of the Runge-Kutta-Merson and GMRES methods“, Journal of Math-for-Industry, 2011, vol. 3, pp. 73–79 [PDF])
The Hough transform is a commonly used algorithm to detect lines and other features in images. It is robust to noise and occlusion, but has a large computational cost. This paper introduces two new implementations of the Hough transform for lines on a GPU. One focuses on minimizing processing time, while the other has an input-data independent processing time. Our results show that optimizing the GPU code for speed can achieve a speed-up over naive GPU code of about 10x. The implementation which focuses on processing speed is the faster one for most images, but the implementation which achieves a constant processing time is quicker for about 20% of the images.
(Gert-Jan van den Braak, Cedric Nugteren, Bart Mesman and Henk Corporaal: “Fast Hough Transform on GPUs: Exploration of Algorithm Trade-offs”. In: Advanced Concepts for Intelligent Vision Systems, Lecture Notes in Computer Science, Vol. 6915, pp.611-622, 2011. [DOI])
TunaCode is pleased to announce the release of CUVI (CUDA Vision and Imaging Library) version 0.5 which comes with a new API and new features. This release makes it even simpler to add acceleration to existing Imaging applications, without any prior technical knowledge of GPUs. CUVI v0.5 is built from bottom up with performance and ease-of-use in mind.
CUVI version 0.5 is available for download at http://cuvilib.com and is available for Windows (Win32, x64) with planned support for Linux and Mac.
Functional magnetic resonance imaging (fMRI) makes it possible to non-invasively measure brain activity with high spatial resolution. There are however a number of issues that have to be addressed. One is the large amount of spatio-temporal data that needs to be processed. In addition to the statistical analysis itself, several preprocessing steps, such as slice timing correction and motion compensation, are normally applied. The high computational power of modern graphic cards has already successfully been used for MRI and fMRI. Going beyond the ﬁrst published demonstration of GPU-based analysis of fMRI data, all the preprocessing steps and two statistical approaches, the general linear model (GLM) and canonical correlation analysis (CCA), have been implemented on a GPU. For an fMRI dataset of typical size (80 volumes with 64 x 64 x 22 voxels), all the preprocessing takes about 0.5 s on the GPU, compared to 5 s with an optimized CPU implementation and 120 s with the commonly used statistical parametric mapping (SPM) software. A random permutation test with 10 000 permutations, with smoothing in each permutation, takes about 50 s if three GPUs are used, compared to 0.5 – 2.5 h with an optimized CPU implementation. The presented work will save time for researchers and clinicians in their daily work and enables the use of more advanced analysis, such as non-parametric statistics, both for conventional fMRI and for real-time fMRI.
(Anders Eklund, Mats Andersson, Hans Knutsson: “fMRI Analysis on the GPU – Possibilities and Challenges”, Computer Methods and Programs in Biomedicine, 2011 [DOI])
Fast Random Permutation Tests Enable Objective Evaluation of Methods for Single Subject fMRI AnalysisJuly 17th, 2011
Parametric statistical methods, such as Z-, t-, and F-values are traditionally employed in functional magnetic resonance imaging (fMRI) for identifying areas in the brain that are active with a certain degree of statistical significance. These parametric methods, however, have two major drawbacks. First, it is assumed that the observed data are Gaussian distributed and independent; assumptions that generally are not valid for fMRI data. Second, the statistical test distribution can be derived theoretically only for very simple linear detection statistics. With non-parametric statistical methods, the two limitations described above can be overcome. The major drawback of non-parametric methods is the computational burden with processing times ranging from hours to days, which so far have made them impractical for routine use in single subject fMRI analysis. In this work, it is shown how the computational power of cost-efficient Graphics Processing Units (GPUs) can be used to speed up random permutation tests. A test with 10 000 permutations takes less than a minute, making statistical analysis of advanced detection methods in fMRI practically feasible. To exemplify the permutation based approach, brain activity maps generated by the General Linear Model (GLM) and Canonical Correlation Analysis (CCA) are compared at the same significance level. During the development of the routines and writing of the paper, 3-4 years of processing time has been saved by using the GPU.
(Anders Eklund, Mats Andersson, Hans Knutsson: “Fast Random Permutation Tests Enable Objective Evaluation of Methods for Single Subject fMRI Analysis”, International Journal of Biomedical Imaging, Article ID 627947, 2011 [Youtube Video] [PDF])
The use of image denoising techniques is an important part of many medical imaging applications. One common application is to improve the image quality of low-dose, i.e. noisy, computed tomography (CT) data. The medical imaging domain has seen a tremendous development during the last decades. It is now possible to collect time resolved volumes, i.e. 4D data, with a number of modalities (e.g. ultrasound (US), CT, magnetic resonance imaging (MRI)). While 3D image denoising previously has been applied to several volumes independently, there has not been much work done on true 4D image denoising, where the algorithm considers several volumes at the same time (and not a single volume at a time). By using all the dimensions, it is for example possible to remove some of the time varying reconstruction artefacts that exist in CT volumes. The problem with 4D image denoising, compared to 2D and 3D denoising, is that the computational complexity increases exponentially. In this paper we describe a novel algorithm for true 4D image denoising, based on local adaptive ﬁltering, and how to implement it on the graphics processing unit (GPU). The algorithm was applied to a 4D CT heart dataset of the resolution 512 x 512 x 445 x 20. The result is that the GPU can complete the denoising in about 25 minutes if spatial ﬁltering is used and in about 8 minutes if FFT based ﬁltering is used. The CPU implementation requires several days of processing time for spatial ﬁltering and about 50 minutes for FFT based ﬁltering. Fast spatial ﬁltering makes it possible to apply the denoising algorithm to larger datasets (compared to if FFT based ﬁltering is used). The short processing time increases the clinical value of true 4D image denoising signiﬁcantly.
Many image processing applications use the histogramming algorithm, which fills a set of bins according to the frequency of occurrence of pixel values taken from an input image. Histogramming has been mapped on a GPU prior to this work. Although significant research effort has been spent in optimizing the mapping, we show that the performance and performance predictability of existing methods can still be improved.
In this paper, we present two novel histogramming methods, both achieving a higher performance and predictability than existing methods. We discuss performance limitations for both novel methods by exploring algorithm trade-offs.
The first novel method gives an average performance increase of 33% over existing methods for non-synthetic benchmarks. The second novel method gives an average performance increase of 56% over existing methods and guarantees to be fully data independent. While the second method is specifically designed for Fermi GPU architectures, the first method is also suitable for older architectures.
(Cedric Nugteren, Gert-Jan van den Braak, Henk Corporaal, Bart Mesman: “High performance predictable histogramming on GPUs: exploring and evaluating algorithm trade-offs”, GPGPU-4: Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units. [DOI] [Paper and Source Code])