You are here: Home » Archives for Image Processing
August 1st, 2010

TunaCode has announced the release of CUVI Lib v0.3 (Beta version) for Windows 32 and 64 Systems. A copy can be downloaded from http://www.cuvilib.com/downloads.
CUVI Lib (CUDA for Vision and Imaging Lib) is an add-on library for NPP (NVIDIA Performance Primitives) and includes several advanced computer vision and image processing functions presently not available in NPP. This version of CUVI Lib supports, among others:
- Optical Flow (Horn & Shunck)
- Optical Flow (Lucas & Kanade)
- Discrete Wavelet Transform (Forward and Inverse)
- Hough Transform
- Hough Lines (Lines Detector)
- Color Conversion (RGB-to-gray and RGBA-to-Gray)
Several more advanced features will be added to CUVI Lib in upcoming releases. A detailed function reference can be downloaded here. Forums to discuss feedback and further ideas are available.
Posted in Developer Resources | Tags: Computer Vision, Image Processing, NVIDIA CUDA | Write a comment
July 4th, 2010
SagivTech plans to offer a 3-days course that deals with Image Processing with CUDA in the USA this September. This is an advanced course that is intended for experienced CUDA developers looking for optimization methods for image processing applications implemented on NVIDIA GPUs.
The course will be held in the San Francisco area, 9am to 5pm September 27-29.
Read the rest of this entry »
Posted in Business, Developer Resources, Events | Tags: Image Processing, NVIDIA CUDA, Tutorials & Courses | Write a comment
May 30th, 2010
Abstract:
In this work, we evaluate performance of a real-world image processing application that uses a cross-correlation algorithm to compare a given image with a reference one. The algorithm processes individual images represented as 2-dimensional matrices of single-precision floating-point values using O(n^4) operations involving dot-products and additions. We implement this algorithm on a nVidia GTX 285 GPU using CUDA, and also parallelize it for the Intel Xeon (Nehalem) and IBM Power7 processors, using both manual and automatic techniques. Pthreads and OpenMP with SSE and VSX vector intrinsics are used for the manually parallelized version, while a state-of-the-art optimization framework based on the polyhedral model is used for automatic compiler parallelization and optimization. The performance of this algorithm on the nVidia GPU suffers from: (1) a smaller shared memory, (2) unaligned device memory access patterns, (3) expensive atomic operations, and (4) weaker single-thread performance. On commodity multi-core processors, the application dataset is small enough to fit in caches, and when parallelized using a combination of task and short-vector data parallelism (via SSE/VSX) or through fully automatic optimization from the compiler, the application matches or beats the performance of the GPU version. The primary reasons for better multi-core performance include larger and faster caches, higher clock frequency, higher on-chip memory bandwidth, and better compiler optimization and support for parallelization. The best performing versions on the Power7, Nehalem, and GTX 285 run in 1.02s, 1.82s, and 1.75s, respectively. These results conclusively demonstrate that, under certain conditions, it is possible for a FLOP-intensive structured application running on a multi-core processor to match or even beat the performance of an equivalent GPU version.
(Rajesh Bordawekar and Uday Bondhugula and Ravi Rao: “Believe It or Not! Multi-core CPUs Can Match GPU Performance for FLOP-intensive Application!”. Technical Report RC24982, IBM Thomas J. Watson Research Center, Apr. 2010.)
Posted in Research | Tags: Image Processing, Multicore, NVIDIA CUDA, Papers | 6 Comments
May 13th, 2010
Imaging translates information into and out of the visual system with today’s computation engine of choice: digital electronic systems. While scalar architectures are no longer scaling at historical rates, we see a massive explosion in the total number of connected computation devices and the ways that hardware architectures and software parallel programming environments use these devices to work in concert and in parallel. From the computing cloud to map-reduce programming models and systems to multi-core CPUs to the regular layout of graphics processing units (GPUs) to the increasing capacity of FPGA fabrics, a range of parallel architectures and parallel programming environments are available to designers and researchers to solve computationally complex problems in efficient (and often real-time) imaging applications.
Read the rest of this entry »
Posted in Events, Research | Tags: Call for Papers, Image Processing, Parallel Computing | Write a comment
March 23rd, 2010
Abstract:
We present our effort in developing an open-source GPU (graphics processing units) code library for the MATLAB Image Processing Toolbox (IPT). We ported a dozen of representative functions from IPT and based on their inherent characteristics, we grouped these functions into four categories: data independent, data sharing, algorithm dependent and data dependent. For each category, we present a detailed case study, which reveals interesting insights on how to efficiently optimize the code for GPUs and highlight performance-critical hardware features, some of which have not been well explored in existing literature. Our results show drastic speedups for the functions in the data-independent or data-sharing category by leveraging hardware support judiciously; and moderate speedups for those in the algorithm-dependent category by careful algorithm selection and parallelization. For the functions in the last category, fine-grain synchronization and data-dependency requirements are the main obstacles to an efficient implementation on GPUs.
(J. Kong, et. al., “Accelerating MATLAB Image Processing Toolbox Functions on GPUs”, Proceedings of the Third Workshop on General-Purpose Computation on Graphics Processing Units (GPGPU-3), Pittsburgh, PA. Apr. 2010. Source code is available here.)
Posted in Developer Resources, Research | Tags: ATI Stream, Image Processing, MATLAB, NVIDIA CUDA, OpenCL, Papers | Write a comment
September 16th, 2009
Today NVIDIA and TopCoder launched the first contest in the CUDA SuperHero Challenge.
The first contest challenges participants to develop the highest performing solution for GPU-Accelerated Connected Component Labeling of images. CCL is a simple but computationally intensive image processing operation that is used in many applications including machine vision, real-time object recognition, and security.
TopCoder is a large community of over 200,000 members of which over 32,000 have been active participants in the last 90 days. Anyone can register to be a TopCoder to participate in the CUDA SuperHero Challenge. Contestants around the world will be competing for some hard cash – and also the opportunity to be TopCoder’s first CUDA SuperHeroes.
The challenge is simple to understand and in fact simple to get a first implementation up and running; but winning will take plenty of CUDA skill as the challenge will exercise many CUDA optimization techniques.
The winners will be announced at the NVIDIA GPU Technology Conference at the end of September. See the contest details here.
Posted in Events | Tags: Contests, Image Processing, NVIDIA CUDA | Write a comment
June 8th, 2009
NVIDIA NVPP is a library of functions for performing CUDA accelerated processing. The initial set of functionality in the library focuses on imaging and video processing and is widely applicable for developers in these areas. NVPP will evolve over time to encompass more of the compute heavy tasks in a variety of problem domains. The NVPP library is written to maximize flexibility, while maintaining high performance.
NVPP can be used in one of two ways:
- A stand-alone library for adding GPU acceleration to an application with minimal effort. Using this route allows developers to add GPU acceleration to their applications in a matter of hours.
- A cooperative library for interoperating with a developer’s GPU code efficiently.
Either route allows developers to harness the massive compute resources of NVIDIA GPUs, while simultaneously reducing development times. The NVPP API matches the Intel Performance Primitives (IPP) library API so that porting existing IPP code to the GPU is easy to do. For more information and to sign up for access to the beta release of NVPP, visit the NVPP website.
Posted in Developer Resources | Tags: Image Processing, Intel, Libraries, NVIDIA CUDA, NVPP, Performance Primitives, Video Processing | Write a comment
March 31st, 2009
Posted in Developer Resources, Research | Tags: Image Processing, Libraries, Papers, Signal Processing | Write a comment
August 11th, 2008
This paper described an implementation of fast deformable image registration using GPUs and CUDA in radiation therapy. Using lung and prostate volumetric imaging, the GPU implementation is 40-66 times faster than a single-threaded CPU implementation and 25-41 times faster than a multithreaded implementation. The paradigm of GPU-based near-real-time deformable image registration opens up a host of clinical applications for medical imaging. ( High performance computing for deformable image registration: Towards a new paradigm in adaptive radiotherapy. (Sanjiv S. Samant, Junyi Xia, Pınar Muyan-Özçelik, John D. Owens. Medical physics, 2008.)
Posted in Research | Tags: Image Processing, Medical Imaging, Papers | Write a comment
April 23rd, 2008
Vision4ce launched a new line of General-purpose Rugged Image Processing (GRIP) products at the recent SPIE Defense and Security Symposium in Orlando from 18th-20th March 2008. The GRIP-Beta showed cutting edge GPGPU-based image processing demonstrations, analog and Gigabit Ethernet video streams and the robust functionality in the Gripworkx image processing framework. The Vision4ce team with GRIP now addresses numerous rugged embedded computing challenges with a cost effective, readily available rugged solution that might normally be served by more expensive and lengthy FPGA approaches. See www.vision4ce.com for more information.
Posted in Business | Tags: Image Processing, systems | Write a comment