Fast GPU Debayer Software

March 13th, 2013

The GPU Debayer software developed by Fastvideo can be used for demosaicing of raw 8-bit Bayer images to full-color 24-bit RGB format. The application employs the HQLI and DFPD algorithms and is tuned for NVIDIA GPUs, which results in very fast conversion, e.g., only 1.25 ms for Full HD image demosaicing on GeForce GTX 580. The software is freely available.

An In-GPU-Memory Column-Oriented Database for Processing Analytical Workloads

March 13th, 2013


Due to ever increasing demand for fast processing of large analytical workloads, main memory column-oriented databases have attracted a lot of attention in recent years. In-memory databases eliminate the disk I/O barrier by storing the data in memory. In addition, they utilize a column-oriented data layout to offer a multi-core-friendly and memory-bandwidth-efficient processing scheme. On the other hand, recently, graphics processing un‏its (GPUs) have emerged as powerful tools for general high-performance computing. GPUs are affordable and energy-efficient devices that deliver a massive computational power by utilizing a large number of cores and a high memory bandwidth. GPUs can be used as co-processors for query acceleration of in-memory databases. One of the main bottlenecks in GPU-acceleration of in-memory databases is the need for data to be transferred back and forward between GPU memory and RAM through a low-bandwidth PCIe bus. To address this problem, in this study, a new generation of in-memory databases is proposed that instead of keeping data in main memory stores it in GPU device memory.

(Pedram Ghodsnia: “An In-GPU-Memory Column-Oriented Database for Processing Analytical Workloads”, VLDB 2012 PhD Workshop, Istanbul, Turkey, August 2012. [PDF])

Accelerating Computer Vision Algorithms Using OpenCL on The Mobile GPU

March 12th, 2013


Recently, general-purpose computing on graphics processing units (GPGPU) has been enabled on mobile devices thanks to the emerging heterogeneous programming models such as OpenCL. The capability of GPGPU on mobile devices opens a new era for mobile computing and can enable many computationally demanding computer vision algorithms on mobile devices. As a case study, this paper proposes to accelerate an exemplar-based inpainting algorithm for object removal on a mobile GPU using OpenCL. We discuss the methodology of exploring the parallelism in the algorithm as well as several optimization techniques. Experimental results demonstrate that our optimization strategies for mobile GPUs have significantly reduced the processing time and make computationally intensive computer vision algorithms feasible for a mobile device. To the best of the authors’ knowledge, this work is the first published implementation of general-purpose computing using OpenCL on mobile GPUs.

(Guohui Wang, Yingen Xiong, Jay Yun and Joseph R. Cavallaro: “Accelerating Computer Vision Algorithms Using OpenCL on the Mobile GPU – A Case Study”, International Conference on Acoustics, Speech, and Signal Processing (ICASSP)}, May 2013, to appear. [PDF])

CfP: UnConventional High Performance Computing 2013

March 5th, 2013

As the word “UnConventional” in the title suggests, the workshop focuses on hardware or platforms used for HPC, which were not intended for HPC in the first place. Reasons could be raw computing power, good performance per watt, or low cost in general. To address this unconventional hardware, often, new programming approaches and paradigms are required to make best use of it. A second focus of the workshop is on innovative, (yet) unconventional new programming models. To this end, UCHPC tries to capture solutions for HPC which are unconventional today but could become conventional and significant tomorrow, and thus provide a glimpse into the eventual future of HPC. The goal of the workshop is to present latest research in how hardware and software (yet) unconventional for HPC is or can be used to reach goals such as best performance per watt. UCHPC also covers according programming models, compiler techniques, and tools.

UCHPC is held in conjunction with Euro-Par 2013, August 26 – August 30, Aachen, Germany. More information,including the full call for papers, submission instructions and important dates:

New GPU Computing Webinars

March 3rd, 2013

The following new webinars about NVIDIA Tesla K20 have been announced. During these live webinars, developers will be able to get answers directly from the presenters.

Call for Papers: AMD 2013 Developer Summit

February 25th, 2013

Calling all software development innovators in general purpose GPU (GPGPU), data parallel and heterogeneous computing. On November 11-14, 2013 AMD will host the AMD 2013 Developer Summit in San Jose California. The AMD Developer Summit conference board has issued a call for presentation proposals, inviting creators of next-generation software to share research and development work through presentations based on the latest technical papers or reports.

The AMD Developer Summit will be a great venue for developers, academics and innovative entrepreneurs to network with others engaged in related work, collectively defining the future course of heterogeneous computing. And delivering a presentation offers you the perfect opportunity to advocate programming paradigms or gain support for industry standards.

The submission deadline is Mar. 15, 2013, and the submission website is available at:

PARALUTION – A fast, user-friendly library for sparse iterative methods on CPUs and GPUs

February 25th, 2013

PARALUTION is a library for sparse iterative methods with special focus on multi-core and accelerator technology such as GPUs. In particular, it incorporates fine-grained parallel preconditioners designed to expolit modern multi-/many-core devices. Based on C++, it provides a generic and flexible design and interface which allow seamless integration with other scientific software packages. The library is open source and released under GPL. Key features are:

  • OpenMP, CUDA and OpenCL support
  • No special hardware/library requirement
  • Portable code and results across all hardware
  • Many sparse matrix formats
  • Various iterative solvers/preconditioners
  • Generic and robust design
  • Plug-in for the finite element package Deal.II
  • Documentation: user manual (pdf), reports, doxygen

More information, including documentation and case studies, is available at

Lab4241 GP-GPU profiler

February 21st, 2013

A free, pre-alpha release of Lab4241′s GPGPU profiler is now available at It provides source-code-line performance profiling for C or C++ code and CUDA kernels in a non-intrusive way. The profiler enables the developer to a seamless evaluation of used GPU resources (execution counts, memory access, branch diversions, etc.) per source-line, along with result evaluation in a simple, intuitive GUI, similar as with known CPU profilers like Quantify or valgrind.

Call for papers: ADBIS workshop on GPUs In Databases (GID 2013)

February 12th, 2013

High performance of modern Graphics Processing Units may be utilized not only for graphics related application but also for general computing. This computing power has been utilized in new variants of many algorithms from almost every computer science domain. Unfortunately, while other application domains strongly benefit from utilizing the GPUs, databases related applications seem not to get enough attention. The main goal of the GPUs in Databases (GID) workshop is to fill this gap. This event is devoted to sharing the knowledge related to applying GPUs in Database environments and to discuss possible future development of this application domain. The list of topics of the GID workshop includes (but is not limited to):

  • Data compression on GPUs
  • GPUs in databases and data warehouses
  • Data mining using GPUs
  • Stream processing
  • Applications of GPUs in bioinformatics
  • Data oriented GPU primitives

For details please visit

Free online course on parallel programming on Udacity

February 10th, 2013

This class teaches the fundamentals of parallel computing with the GPU and the CUDA programming environment. Examples are based on a series of image processing algorithms, such as those in Photoshop or Instagram. Programming and running assignments on high-end GPUs is possible, even if you don’t own one yourself. The course started Monday 4th Feb 2013 so there is still time to join. More information and enrollment:

Page 12 of 108« First...1011121314...203040...Last »