OpenACC Compilers Now Available from PGI

July 27th, 2012

PGI Release 12.6 is now out. New in this release:

  • PGI Accelerator compilers — first release of the Fortran and C compilers to include comprehensive support for the OpenACC 1.0 specification including the acc cache construct and the entire OpenACC API library. See the PGI Accelerator page for a complete list of supported features.
  • CUDA Toolkit — PGI Accelerator compilers and CUDA Fortran now include support for CUDA Toolkit version 4.2; version 4.1 is now the default.

Download a free trial from the PGI website at Upcoming PGI webinar with Michael Wolfe. 9:00AM PDT, July 31st sponsored by NVIDIA: “Using OpenACC Directives with the PGI Accelerator Compilers”. Register at

C++ AMP Courses

May 26th, 2012

C++ Accelerated Massive Parallelism (C++ AMP) is a new open specification heterogeneous programming model, which builds on the established C++ language. Developed for heterogeneous platforms C++ AMP is designed to accelerate the execution of C++ code by taking advantage of the data-parallel hardware that is commonly present as a GPU. These courses are aimed at programmers who are looking to develop comprehensive skills in writing and optimizing applications using C++ AMP. Read the rest of this entry »

Accelerate OpenFOAM with SpeedIT 2.1

May 24th, 2012

SpeedIT provides a set of accelerated solvers for sparse linear systems of equations. The library supports C/C++ and Fortran, and it can be used with OpenFOAM to accelerate CFD simulations. SpeedIT 2.1 contains two new preconditioners:

• Algebraic Multigrid with Smoothed Aggregation (AMG)
• Approximate Inverse (AINV)

OpenFOAM simulations on the GPU can be up to 3.5x faster compared to CG and DIC/DILU preconditioners on the CPU and up to 1.6x faster if you run GAMG.

See the SpeedIT website and blog for more details.

NVIDIA Kepler GK110 Architecture White Paper

May 20th, 2012

NVIDIA Kepler GK110 Die Shot

This white paper describes the new Kepler  GK110 Architecture from NVIDIA.

Comprising 7.1 billion transistors, Kepler GK110 is not only the fastest, but also the most architecturally complex microprocessor ever built. Adding many new innovative features focused on compute performance, GK110 was designed to be a parallel processing powerhouse for Tesla® and the HPC market.

Kepler GK110 will provide over 1 TFlop of double precision throughput with greater than 80% DGEMM efficiency versus 60‐65% on the prior Fermi architecture.

In addition to greatly improved performance, the Kepler architecture offers a huge leap forward in power efficiency, delivering up to 3x the performance per watt of Fermi.

The paper describes features of the Kepler GK110 architecture, including

  • Dynamic Parallelism;
  • Hyper-Q;
  • Grid Management Unit;
  • NVIDIA GPUDirect™;
  • New SHFL instruction and atomic instruction enhancements;
  • New read-only data cache previously only accessible to texture;
  • Bindless Textures;
  • and much more.

CUVILib v1.2 released

May 17th, 2012

TunaCode has released CUVILib v1.2, a library to accelerate imaging and computer vision applications. CUVILib adds acceleration to Imaging applications from Medical, Industrial and Defense domains. It delivers very high performance and supports both CUDA and OpenCL. Modules include color operations (demosaic, conversions, correction etc), linear/non-linear filtering, feature extraction & tracking, motion estimation, image transforms and image statistics.

More information, including a free trial version:

New Libra Platform version released

April 21st, 2012

Libra Platform is a GPGPU-Heterogeneous Compute API and runtime environment available on Windows, Mac and Linux. Libra Compute API offers performance portability and direct compute access via standard programming environments C/C++, Java, C# and Matlab to execute math operations on top of current and future compute architectures, including the latest GPUs, x86/x64 CPUs and with broad support for compute devices compatible with low level specific APIs – OpenCL, CUDA, OpenGL and standard x86/x64 compute APIs.

Read more in the full announcement.

2 Day CUDA Workshop, May 5-6 2012, Berlin, Germany

April 21st, 2012

A 2 day CUDA workshop is taking place in Berlin, Germany on May 5 and 6 2012. Course details, outline and prices are available at

Acceleware OpenCL™ Training in NYC

February 28th, 2012

Developed in partnership with AMD, this four day course is designed for GPU Programmers who are looking to develop comprehensive skills in writing and optimizing applications that fully leverage the multi-core processing capabilities of the GPU.

Delivered by Acceleware’s Developers, who provide real world experience and examples, the training comprises classroom lectures and hands-on tutorials. Each student will be supplied with a laptop equipped with an AMD Fusion APU for the duration of the course. Small class sizes maximize learning and ensure a personal educational experience. Read the rest of this entry »

SpeedIT 2.0 released

February 24th, 2012

SpeedIT 2.0 and the SpeedIT plugin to OpenFOAM have been released. New features include:

  • One of the fastest Sparse Matrix Vector Multiplication worldwide.
  • Faster Conjugate Gradient and BiConjugate Gradient solvers.
  • State-of-the-art CMRS format for storing sparse matrices. The format requires less memory than CRS or HYB (from CUSPARSE and CUSP).
  • Faster acceleration in OpenFOAM (Computational Fluid Dynamics).

More information is available at

Performance of SpMV in CUSPARSE, CUSP and SpeedIT

January 14th, 2012

The SpeedIt team recently compared and benchmarked the SpMV performance of CUSPARSE 4.0, CUSP 0.2.0 and SpeedIT 2.0 on 23 randomly chosen matrices from University Florida Matrix Collection. Comparisons were done on a Tesla C2050 in single and double precision. The full report is available at

Page 3 of 1312345...10...Last »