Thrust v1.7 Released

July 4th, 2013

The Thrust team is pleased to announce the release of Thrust v1.7, an open-source C++ library for developing high-performance parallel applications. Modeled after the C++ Standard Template Library, Thrust brings a familiar abstraction layer to the realm of parallel computing

Thrust 1.7.0 introduces a new interface for controlling algorithm execution as well as several new algorithms and performance improvements. With this new interface, users may directly control how algorithms execute as well as details such as the allocation of temporary storage. Key/value versions of thrust::merge and the set operation algorithms have been added, as well stencil versions of partitioning algorithms. For 32b types, new CUDA merge and set operations provide 2-15x faster performance while a new CUDA comparison sort provides 1.3-4x faster performance.

Thrust is open-source software distributed under the OSI-approved Apache License 2.0.

MC# 3.0 with GPU support

July 22nd, 2012

Version 3.0 of the MC# programming system has been released. MC# is an universal parallel programming language aimed to any parallel architecture  –  multicore processors, systems with GPU, or clusters. It is an extension of C# language and supports high-level parallel programming style.

VexCL: Vector expression template library for OpenCL

May 30th, 2012

VexCL is vector expression template library for OpenCL developed by the Supercomputer Center of Russian academy of sciences. It has been created for ease of C++ based OpenCL development. Multi-device (and multi-platform) computations are supported. The code is publicly available under MIT license.

Main features:

  • Selection and initialization of compute devices according to extensible set of device filters.
  • Transparent allocation of device vectors spanning multiple devices.
  • Convenient notation for vector arithmetic, sparse matrix-vector multiplication, reductions. All computations are performed in parallel on all selected devices.
  • Appropriate kernels for vector expressions are generated automatically first time an expression is used.

Doxygen-generated documentation is available at

Solving ordinary differential equations with CUDA

August 8th, 2011

Odeint is a high level C++ library for solving ordinary differential equations. It is released under an open-source license and supports a variety of different methods for solving ODEs. As a special feature it supports different algebras which perform the basic mathematical operations. This allows the user to solve ordinary differential equations on modern graphic cards. A Thrust interface is implemented, so that the power of CUDA can easily be employed. Furthermore, arbitrary precision types can easily be supported.  Read the rest of this entry »

GPU.NET v2.0 released

July 29th, 2011

TidePowerd has released Version 2 of their GPU computing solution for the .NET framework, GPU.NET. Their platform allows developers to quickly and easily write GPU-accelerated applications completely in .NET-based languages. Some key benefits include:

  • Stay in C# and treat kernel methods like any regular method
  • “Boilerplate” GPU programming tasks such as memory transfer and GPU scheduling are abstracted from the developer
  • Cross-platform and cross-hardware with a single binary
  • Systems seamlessly adapt to new hardware without rewriting code
  • Speed on par with native code

New version 2 features:

  • Visual Studio Error list and IntelliSense integration
  • On-device random number generation
  • Double precision support

A free 30-days evaluation license is available, as well as in-depth examples and tutorials.

GPU Computing and C++: An Evening with Microsoft and NVIDIA

June 26th, 2011

In Silicon Valley? Interested in C++? Join in an evening with Microsoft & NVIDIA to discuss new C++ technology for parallel computing. Register here:

  • 5:45 PM Welcome & Registration
  • 6:00 PM Heterogeneous Parallelism in General, C++ in AMP in Particular, presented by Herb Sutter, Principal Architect for Windows C++, Microsoft
  • 7:15 PM ALM tools for C++ in Visual Studio V.NEXT, presented by Rong Lu, Program Manager C++, Microsoft
  • 8:00 PM The Power of Parallel, presented by the NVIDIA Team;
    • Parallel Nsight: Programming GPUs in Visual Studio, Stephen Jones, NVIDIA;
    • CUDA 4.0: Parallel Programming Made Easy, Justin Luitjens, NVIDIA;
    • Thrust: C++ Template Library for GPGPUs, Jared Hoberock, NVIDIA

Refreshments provided.

CUDAfy – GPGPU completely in .NET

March 21st, 2011

From a recent press release:

CUDAfy is a .NET SDK that allows you to write, debug and emulate CUDA GPU applications in any .NET language including C# or Visual Basic. The aim is to bring the power of GPGPU to the large number of .NET developers out there. Features include:

  • .NET object orientated CUDA model (GThread)
  • Write .NET code marking methods, structures and constants that should be translated to CUDA (“Cudafying”)
  • An add-in for Red Gate’s .NET Reflector tool that translates to CUDA C
  • Built in emulation of GPU kernel functions
  • 1D, 2D and 3D array support including access to Array class’s Length, GetLength and Rank members
  • Use all standard .NET value types. No new types even for managing data allocated on GPU
  • Simple .NET wrapper for CUFFT and CUBLAS

During our work with the European Space Agency, Astrium and NLR we saw how GPUs could significantly improve performance of the emulation of algorithms targeted on FPGAs and ASICs. The SDEs and SDKs produced were .NET based and CUDAfy is the result of efforts to more tightly integrate the GPU and CPU code development. There are user guides and sample projects. Many of the samples in the book CUDA by Example have been ported to .NET. See for downloads and more information.

Brahma: Shader meta-programming framework for GPUs

December 13th, 2006

Brahma is an open source shader meta-programming framework for the .NET platform that generates shader code from IL at runtime, enabling developers to write GPU code in C# (or any NET language). The library is primarily meant to handle GPU-based rendering and computational tasks, and eliminates a great deal of glue code that is often required in GPU programming. Since Brahma is a set of interfaces and base classes, it can be implemented for any combination of API and shading language. At this time there is a working shader generation path for Managed DirectX/HLSL. (