CUVILib v1.2 released

May 17th, 2012

TunaCode has released CUVILib v1.2, a library to accelerate imaging and computer vision applications. CUVILib adds acceleration to Imaging applications from Medical, Industrial and Defense domains. It delivers very high performance and supports both CUDA and OpenCL. Modules include color operations (demosaic, conversions, correction etc), linear/non-linear filtering, feature extraction & tracking, motion estimation, image transforms and image statistics.

More information, including a free trial version: http://www.cuvilib.com/

New rCUDA version beta testing

April 18th, 2012

The rCUDA Team is proud to announce a new version of the rCUDA framework which will include many new functionalities as well as boosted performance. This new version, cooked for over a year, will incorporate pipelined transfers, full multi-thread and multi-node capabilities, CUDA 4.1 support, global scheduler integration, support for CUDA C extensions, and native InfiniBand support. A closed beta teting program has been started. See the complete text at http://www.rcuda.net/index.php/news/19-new-revolutionary-version-of-rcuda-to-be-launched.html.

SpeedIT 2.0 released

February 24th, 2012

SpeedIT 2.0 and the SpeedIT plugin to OpenFOAM have been released. New features include:

  • One of the fastest Sparse Matrix Vector Multiplication worldwide.
  • Faster Conjugate Gradient and BiConjugate Gradient solvers.
  • State-of-the-art CMRS format for storing sparse matrices. The format requires less memory than CRS or HYB (from CUSPARSE and CUSP).
  • Faster acceleration in OpenFOAM (Computational Fluid Dynamics).

More information is available at http://speed-it.vratis.com.

New CLOGS library with sort and scan primitives for OpenCL

February 5th, 2012

CLOGS is a library for higher-level operations on top of the OpenCL C++ API. It is designed to integrate with other OpenCL code, including synchronization using OpenCL events. Currently only two operations are supported: radix sorting and exclusive scan. Radix sort supports all the unsigned integral types as keys, and all the built-in scalar and vector types suitable for storage in buffers as values. Scan supports all the integral types. It also supports vector types, which allows for limited multi-scan capabilities.

Version 1.0 of the library has just been released. The home page is http://clogs.sourceforge.net/

ViennaCL 1.2.0 released

January 2nd, 2012

Version 1.2.0 of the OpenCL-based C++ linear algebra library ViennaCL is now available for download! It features a high-level interface compatible with Boost.ublas, which allows for compact code and high productivity. Highlights of the new release are the following features (all experimental):

  • Several algebraic multigrid preconditioners
  • Sparse approximate inverse preconditioners
  • Fast Fourier transform
  • Structured dense matrices (circulant, Hankel, Toeplitz, Vandermonde)
  • Reordering algorithms (Cuthill-McKee, Gibbs-Poole-Stockmeyer)
  • Proxies for manipulating subvectors and submatrices

The features are expected to reach maturity in the 1.2.x branch. More information about the library including download links is available at http://viennacl.sourceforge.net.

FortranCL: An OpenCL interface for Fortran 90

December 30th, 2011

FortranCL is an interface to OpenCL from Fortran90 programs, and it is distributed under the LGPL free software license. It allows Fortran programmer to directly execute code on GPUs or other massively parallel processors. The interface is designed to be as close to the C OpenCL interface as possible, and it is written in native Fortran 90 with type checking. FortranCL is not complete yet, but it includes enough subroutines to write GPU accelerated code in Fortran. More information: http://code.google.com/p/fortrancl/

GPU Virtualization for Dynamic GPU Provisioning

November 18th, 2011

From a recent press release:

Taipei, November 18, 2011: Zillians, a leading cloud solution provider specializing in high performance computing, GPU virtualization middleware and massive multi-player online game (MMOG) platforms today announced the availability of vGPU – the world’s first commercial virtualization solution for decoupling GPU hardware from software. Traditionally, physical GPUs must reside on the same machine running GPU code. This severely hampers GPU cloud deployment due to the difficulty of dynamic GPU provisioning. With vGPU technology, bulky hardware is no longer a limiting factor. vGPU introduces a thin, transparent RPC layer between local application and remote GPU, enabling existing GPU software to run without any modification on a remote GPU resource. Read the rest of this entry »

CULA Sparse Now Available

November 10th, 2011

EM Photonics has released CULA Sparse, a ready-to-integrate package for solving sparse linear systems. Features include:

  • Interfaces: C, C++, Fortran, Matlab, Python
  • Platforms: all CUDA platforms. including Linux, Windows, and OS X
  • Solvers and preconditioners: BiCG, BiCGStab, CG, GMRES, MINRES and Jacobi, ILU(0)
  • Data formats: COO, CSR, CSC in double precision real and complex floating point
  • No CUDA programming experience required.

More information is available at http://www.culatools.com/sparse.

rCUDA 3.1 Released

October 20th, 2011

The new version 3.1 of rCUDA (Remote CUDA), the Open Source package that allows performing CUDA calls to remote GPUs, is now available. Release highlights:

  • Fully updated API to CUDA 4.0 (added support for modules “Peer Device Memory Access” and “Unified Addressing”).
  • Fixed low level Surface Reference management functions.

For further information, please visit the rCUDA webpage  at http://www.gap.upv.es/rCUDA.

Thrust: A Productivity-Oriented Library for CUDA

September 12th, 2011

Abstract:

This chapter demonstrates how to leverage the Thrust parallel template library to implement high-performance applications with minimal programming effort. Based on the C++ Standard Template Library (STL), Thrust brings a familiar high-level interface to the realm of GPU Computing while remaining fully interoperable with the rest of the CUDA software ecosystem. Applications written with Thrust are concise, readable, and efficient.

(Nathan Bell and Jared Hoberock: “Thrust: A Productivity-Oriented Library for CUDA”, GPU Computing Gems, Jade Edition, edited by Wen-mei W. Hwu, October 2011)

Page 1 of 512345