CUVILib v1.2 released

May 17th, 2012

TunaCode has released CUVILib v1.2, a library to accelerate imaging and computer vision applications. CUVILib adds acceleration to Imaging applications from Medical, Industrial and Defense domains. It delivers very high performance and supports both CUDA and OpenCL. Modules include color operations (demosaic, conversions, correction etc), linear/non-linear filtering, feature extraction & tracking, motion estimation, image transforms and image statistics.

More information, including a free trial version:

CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform

May 11th, 2012


Motivation: New high-throughput sequencing technologies have promoted the production of short reads with dramatically low unit cost. The explosive growth of short read datasets poses a challenge to the mapping of short reads to reference genomes, such as the human genome, in terms of alignment quality and execution speed.

Results: We present CUSHAW, a parallelized short read aligner based on the compute unified device architecture (CUDA) parallel programming model. We exploit CUDA-compatible graphics hardware as accelerators to achieve fast speed. Our algorithm employs a quality-aware bounded search approach based on the Burrows- Wheeler transform (BWT) and the Ferragina Manzini (FM)-index to reduce the search space and achieve high alignment quality. Performance evaluation, using simulated as well as real short read datasets, reveals that our algorithm running on one or two graphics processing units (GPUs) achieves significant speedups in terms of execution time, while yielding comparable or even better alignment quality for paired-end alignments compared to three popular BWT-based aligners: Bowtie, BWA and SOAP2. CUSHAW also delivers competitive performance in terms of SNP calling for an E.coli test dataset.


(Y. Liu, B. Schmidt, D. Maskell: “CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform”, Bioinformatics, 2012. [DOI])

Acceleware CUDA™ Training – Life Science Focus

May 2nd, 2012

Partnering with NVIDIA and Microsoft, this four day CUDA training course is designed for Researchers and Programmers in the life science industries who are looking to develop comprehensive skills in writing and optimizing applications that fully leverage the many-core processing capabilities of the GPU. It is held in Boston, MA, on June 4-7, 2012. This course will have a life science theme. Commonly used algorithms such as Monte Carlo methods, FFT and filtering will be used and profiled in examples. The case study on day 4 focuses on the efficient implementation of a molecular dynamics simulation. More information:

OpenCL SDK for new Intel Core Processors

April 27th, 2012

The Intel® SDK for OpenCL Applications now supports the OpenCL 1.1 full-profile on 3rd generation Intel® Core™ processors with Intel® HD Graphics 4000/2500. For the first time, OpenCL developers using Intel® architecture can utilize compute resources across both Intel® Processor and Intel HD Graphics. More information:

New Libra Platform version released

April 21st, 2012

Libra Platform is a GPGPU-Heterogeneous Compute API and runtime environment available on Windows, Mac and Linux. Libra Compute API offers performance portability and direct compute access via standard programming environments C/C++, Java, C# and Matlab to execute math operations on top of current and future compute architectures, including the latest GPUs, x86/x64 CPUs and with broad support for compute devices compatible with low level specific APIs – OpenCL, CUDA, OpenGL and standard x86/x64 compute APIs.

Read more in the full announcement.

2 Day CUDA Workshop, May 5-6 2012, Berlin, Germany

April 21st, 2012

A 2 day CUDA workshop is taking place in Berlin, Germany on May 5 and 6 2012. Course details, outline and prices are available at

New rCUDA version beta testing

April 18th, 2012

The rCUDA Team is proud to announce a new version of the rCUDA framework which will include many new functionalities as well as boosted performance. This new version, cooked for over a year, will incorporate pipelined transfers, full multi-thread and multi-node capabilities, CUDA 4.1 support, global scheduler integration, support for CUDA C extensions, and native InfiniBand support. A closed beta teting program has been started. See the complete text at

Scalable GPU graph traversal

April 17th, 2012


Breadth-first search (BFS) is a core primitive for graph traversal and a basis for many higher-level graph analysis algorithms. It is also representative of a class of parallel computations whose memory accesses and work distribution are both irregular and data-dependent. Recent work has demonstrated the plausibility of GPU sparse graph traversal, but has tended to focus on asymptotically inefficient algorithms that perform poorly on graphs with non-trivial diameter.

We present a BFS parallelization focused on fine-grained task management constructed from efficient prefix sum that achieves an asymptotically optimal O(|V|+|E|) work complexity. Our implementation delivers excellent performance on diverse graphs, achieving traversal rates in excess of 3.3 billion and 8.3 billion traversed edges per second using single and quad-GPU configurations, respectively. This level of performance is several times faster than state-of-the-art implementations both CPU and GPU platforms.

(Duane Merrill, Michael Garland and  Andrew Grimshaw: “Scalable GPU graph traversal”, Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming (PPoPP’12), pp.117-128, Feburary 2012. [DOI])

Acceleware 4 Day CUDA™ Course, Calgary

April 17th, 2012

Partnering with NVIDIA, this four day course (May 8-11, 2012) is designed for Programmers who are looking to develop comprehensive skills in writing and optimizing applications that fully leverage the multi-core processing capabilities of the GPU.

Delivered by Acceleware Developers, who provide real world experience and examples, the training comprises of classroom lectures and hands-on tutorials. Each student will be supplied with a laptop equipped with NVIDIA GPUs for the duration of the course. Small class sizes maximize learning and ensure a personal educational experience.

More information:

CFP: Deadline Extension – UKPEW 2012 – The 28th UK Performance Engineering Workshop

April 10th, 2012

UKPEW is the leading UK forum for the presentation of all aspects of performance modeling and analysis of computer and telecommunication systems. Original papers are invited on all relevant topics but papers on or related to the subjects listed below are particularly welcome.

The paper submission deadline has just been extended to April 20, 2012. The conference takes place June 2 and 3, 2012, in Edinburgh, UK. More Information:

Page 20 of 107« First...10...1819202122...304050...Last »