Performance of SpMV in CUSPARSE, CUSP and SpeedIT

January 14th, 2012

The SpeedIt team recently compared and benchmarked the SpMV performance of CUSPARSE 4.0, CUSP 0.2.0 and SpeedIT 2.0 on 23 randomly chosen matrices from University Florida Matrix Collection. Comparisons were done on a Tesla C2050 in single and double precision. The full report is available at

Acceleware 4 Day CUDA Course

January 6th, 2012

Partnering with NVIDIA and Microsoft, this four day course is designed for Programmers who are looking to develop comprehensive skills in writing and optimizing applications that fully leverage the multi-core processing capabilities of the GPU.

Delivered by Acceleware’s Developers, who provide real world experience and examples, the training comprises classroom lectures and hands-on tutorials. Each student will be supplied with a laptop equipped with NVIDIA GPUs for the duration of the course. Small class sizes maximize learning and ensure a personal educational experience.

Register before January 13 and receive $250 off your course fee!
Enter promotional code AXTEB2012

ViennaCL 1.2.0 released

January 2nd, 2012

Version 1.2.0 of the OpenCL-based C++ linear algebra library ViennaCL is now available for download! It features a high-level interface compatible with Boost.ublas, which allows for compact code and high productivity. Highlights of the new release are the following features (all experimental):

  • Several algebraic multigrid preconditioners
  • Sparse approximate inverse preconditioners
  • Fast Fourier transform
  • Structured dense matrices (circulant, Hankel, Toeplitz, Vandermonde)
  • Reordering algorithms (Cuthill-McKee, Gibbs-Poole-Stockmeyer)
  • Proxies for manipulating subvectors and submatrices

The features are expected to reach maturity in the 1.2.x branch. More information about the library including download links is available at

FortranCL: An OpenCL interface for Fortran 90

December 30th, 2011

FortranCL is an interface to OpenCL from Fortran90 programs, and it is distributed under the LGPL free software license. It allows Fortran programmer to directly execute code on GPUs or other massively parallel processors. The interface is designed to be as close to the C OpenCL interface as possible, and it is written in native Fortran 90 with type checking. FortranCL is not complete yet, but it includes enough subroutines to write GPU accelerated code in Fortran. More information:

HOOMD-blue 0.10.0 release

December 19th, 2011

HOOMD-blue performs general-purpose particle dynamics simulations on a single workstation, taking advantage of NVIDIA GPUs to attain a level of performance equivalent to many cores on a fast cluster. Flexible and configurable, HOOMD-blue is currently being used for coarse-grained molecular dynamics simulations of nano-materials, glasses, and surfactants, dissipative particle dynamics simulations (DPD) of polymers, and crystallization of metals.

HOOMD-blue 0.10.0 adds many new features. Highlights include: Read the rest of this entry »

Intel SPMD Compiler Version 1.1 Released

December 7th, 2011

A major new release of the Intel SPMD Program Compiler (ispc) was posted on December 5, 2011. ispc is an extended version of the C programming language with support for “single program, multiple data” (SPMD) programming on the CPU; the SPMD model makes it easy to harness the full power of both the SIMD vector units and multiple cores on modern CPUs. The major features added in the 1.1 release include:

  • Full support for pointers, including pointer arithmetic, function pointers, and all other features of pointers in C.
  • A new parallel “foreach” statement, for more easily mapping computation to data.
  • Substantially revised documentation, including a new Performance Guide.
  • Many other small bug fixes and improvements.

ispc is open-source and is licensed under the BSD license. Source and binaries are available from

CUDA 4.1 RC2 Released

December 6th, 2011

The NVIDIA CUDA Toolkit 4.1 RC2 is now available for anyone to download. The key features of this release are:

  • A new LLVM based compiler
  • Over 1000 additional image processing function in the NPP library
  • A Visual profiler

There is also a new version of Parallel Nsight 2.1 RC2 with support for CUDA 4.1. To download and to find out more follow:

Introduction to Generic Accelerated Computing with Libra SDK

November 30th, 2011

Libra SDK is a sophisticated runtime including API, sample programs and documentation for massively accelerating software computations. This introduction tutorial provides an overview and usage examples of the powerful Libra API & math libraries executing on x86/x64, OpenCL, OpenGL and CUDA technology. Libra API enables generic and portable CPU/GPU computing within software development without the need to create multiple, specific and optimized code paths to support x86, OpenCL, OpenGL or CUDA devices. Link to PDF:

KOAP: Kentucky OpenCL Application Preprocessor

November 29th, 2011

KOAP, pronounced “cope,” is a tool for developing OpenCL applications. It’s purpose is to allow the programmer to aggregate and simplify calls to the OpenCL API. KOAP accepts as input a file containing (or including) both the OpenCL program and the host C program. KOAP understands several directives, each of which is prefixed with a $ character. When KOAP is run, these directives are replaced with the requisite OpenCL API calls. Programs preprocessed by KOAP can run on any target supported by OpenCL, including both NVIDIA and AMD GPUs.

KOAP is now freely available as a source code tar file from

Alenka – A GPU database engine including compression

November 28th, 2011

Support for several types of compression has been added to the GPU-based database engine ålenkå . Supported algorithms include FOR (frame of reference), FOR-DELTA and dictionary compression. All compression algorithms run on the GPU achieving gigabytes per second compression and decompression speed. The use of compression allows to significantly reduce or eliminate I/O bottlenecks in analytical queries as shown by ålenkå’s results in the Star Schema and TPC-H benchmarks.

Page 10 of 40« First...89101112...203040...Last »