Announcing GPU.NET from TidePowerd: “Native” GPU computing for .NET

December 14th, 2010

Tidepowerd LogoThe “Beta 2” version of GPU.NET, a new product by TidePowerd, has recently been released. It allows developers to write  GPU-based code in C# or other .NET-supported languages. GPU.NET beta is available for public download, and the full documentation and several example projects are available online.

CUDA Programming with Ruby

September 27th, 2010

SpeedGo Computing recently announced their development of CUDA bindings for Ruby. Currently, only part of the CUDA Driver API is included. More components such as the CUDA Runtime API will be added to make it as complete as possible. More details as well as sample code can be found in this blog post.

CLyther 0.1 Beta Released

April 25th, 2010

GeoSpin has released the first version of CLyther for beta testing. Please visit the CLyther SourceForge website for more information.  CLyther enables developers to seamlessly write GPGPU code completely in python with no additional syntax. CLyther’s core driver contains a python compiler to convert Python functions and types to OpenCL during runtime.

CLyther currently only supports a subset of the Python language definition but adds many new features to OpenCL such as:

  • OpenCL interface similar to PyOpenCL
  • Dynamic compilation of OpenCL code at runtime
  • Fast prototyping of OpenCL code
  • Create OpenCL code using the Python language definition
  • Passing functions as arguments to OpenCL kernels
  • Pure Python emulation mode of kernel functions

Read the rest of this entry »

CUDA 3.0 toolkit released

March 20th, 2010

NVIDIA has released version 3.0 of the CUDA Toolkit, providing developers with tools to prepare for the upcoming Fermi-based GPUs. Highlights of this release include:

  • Support for the new Fermi architecture, with:
    • Native 64-bit GPU support
    • Multiple Copy Engine support
    • ECC reporting
    • Concurrent Kernel Execution
    • Fermi HW debugging support in cuda-gdb
    • Fermi HW profiling support for CUDA C and OpenCL in Visual Profiler
  • C++ Class Inheritance and Template Inheritance support for increased programmer productivity
  • A new unified interoperability API for Direct3D and OpenGL, with support for:
    • OpenGL texture interop
    • Direct3D 11 interop support
    • CUDA Driver / Runtime Buffer Interoperability, which allows applications using the CUDA Driver API to also use libraries implemented using the CUDA C Runtime such as CUFFT and CUBLAS.
  • Read the rest of this entry »

CLyther = Python + OpenCL

March 9th, 2010

CLyther is an under-development python tool for OpenCL similar to Cython for C. CLyther is a python language extension intended to make writing OpenCL code as easy as Python itself. CLyther currently only supports a subset of the Python language definition but adds many new features for OpenCL.

CLyther exposes both the OpenCL C library and language to python. It’s features include:

  • Fast prototyping of OpenCL code.
  • OpenCL kernel function creation using the Python language definition.
  • Strong OOP programming in OpenCL code.
  • Passing functions as arguments to kernel functions.
  • Python emulation mode for OpenCL code.
  • Fancy indexing of arrays.
  • Dynamic compilation at runtime.

Read the rest of this entry »

Some older publications worth reading

January 17th, 2010

Occasionally, we receive news submissions pointing us to interesting older papers that somehow slipped by without our notice. This post collects a few of those. If you want your work to be posted on  in a timely manner, please remember to use the news submission form.

  • Joshua A. Anderson, Chris D. Lorenz and Alex Travesset present and discuss molecular dynamics simulations and compare a single GPU against a 36-CPU cluster (General purpose molecular dynamics simulations fully implemented on graphics processing units, Journal of Computational Physics 227(10), May 2008, DOI 10.1016/
  • Wen-mei Hwu et al. derive and discuss goals and concepts of programming models for fine-grained parallel architectures, from the point of view of both a programmer and a hardware /compiler designer, and analyze CUDA as one current representative  (Implicitly parallel programming models for thousand-core microprocessors, Proceedings of DAC’07, June 2007, DOI 10.1145/1278480.1278669).
  • Jeremy Sugerman et al. present GRAMPS, a prototype implementation of future graphics hardware that allows pipelines to be specified as graphs in software (GRAMPS: A Programming Model for Graphics Pipelines, ACM Transactions on Graphics 28(1), January 2009, DOI 10.1145/1477926.1477930).
  • William R. Mark discusses concepts of future graphics architectures in this contribution to the 2008 ACM Queue special issue on GPUs (Future graphics architectures, ACM Queue 6(2), March/April 2008,  DOI 10.1145/1365490.1365501).
  • BSGP by Qiming Hou et al. is a new programming language for general purpose GPU computing that achieves the same efficiency as well-tuned CUDA programs but makes code much easier to read, develop and maintain (BSGP: bulk-synchronous GPU programming, ACM Siggraph 2008, August 2008, DOI 10.1145/1399504.1360618).
  • Finally, Che et al. and Garland et al. survey the field of GPU computing and discuss many different application domains. These articles are, in addition to the ones we have collected on the developer pages, recommended to GPGPU newcomers.

GPULib v1.2.2 released

November 25th, 2009

GPULib provides a library of mathematical functions that facilitate the use of high performance computing resources available on modern graphics processing units (GPUs) by engineers, scientists, analysts, and other technical professionals with minimal modification to their existing programs. This software library executes vectorized mathematical functions on graphics processing units (GPUs) from NVIDIA, bringing high-performance numerical operations to everyday desktop computers. By providing bindings for a number of Very High Level Languages (VHLLs) including MATLAB and IDL from ITT Visual Information Solutions, GPULib can accelerate new applications or be incorporated into existing applications with minimal effort. No knowledge of GPU programming and memory management is required. For more information regarding GPULib, please visit

PyCUDA: GPU Run-Time Code Generation for High-Performance Computing

November 25th, 2009


High-performance scientific computing has recently seen a surge of interest in heterogeneous systems, with an emphasis on modern Graphics Processing Units (GPUs). These devices offer tremendous potential for performance and efficiency in important large-scale applications of computational science. However, exploiting this potential can be challenging, as one must adapt to the specialized and rapidly evolving computing environment currently exhibited by GPUs. One way of addressing this challenge is to embrace better techniques and develop tools tailored to their needs. This article presents one simple technique, GPU run-time code generation (RTCG), and PyCUDA, an open-source toolkit that supports this technique.
In introducing PyCUDA, this article proposes the combination of a dynamic, high-level scripting language with the massive performance of a GPU as a compelling two-tiered computing platform, potentially offering significant performance and productivity advantages over conventional single-tier, static systems. It is further observed that, compared to competing techniques, the effort required to create codes using run-time code generation with PyCUDA grows more gently in response to growing needs. The concept of RTCG is simple and easily implemented using existing, robust tools. Nonetheless it is powerful enough to support (and encourage) the creation of custom application-specific tools by its users. The premise of the paper is illustrated by a wide range of examples where the technique has been applied with considerable success.

Preprint at arXiv

(Andreas Klöckner, Nicolas Pinto, Yunsup Lee, Bryan Catanzaro, Paul Ivanov, Ahmed Fasih. PyCUDA: GPU Run-Time Code Generation for High-Performance Computing, submitted.

CUDA Fortran Compiler Beta Release Now Available

September 29th, 2009

A public beta release of the CUDA-enabled Fortran Compiler from PGI enables programmers to write code in Fortran for NVIDIA CUDA GPUs.  From a press release:

What: NVIDIA today announced that a public beta release of the PGI® CUDA-enabled Fortran compiler is now available. Developed in collaboration with The Portland Group® , it is the first Fortran compiler compatible with NVIDIA® CUDA™ -enabled graphics processing units (GPUs).

compiler is a software tool that translates applications from the high-level programming languages used by software developers into a binary form a computer can execute.

Why: GPU computing with the CUDA C-compiler has gained significant momentum in the High-Performance Computing (HPC) space as it enables developers to get transformative increases in performance with minimal coding required.

Fortran is particularly well suited to numeric computation and scientific computing and remains widely used in a wide range of applications such as weather modeling, computational fluid dynamics and seismic processing.

Where can I get it?: Read the rest of this entry »

Intel acquires RapidMind

August 23rd, 2009

Intel has acquired RapidMind, the company behind the RapidMind (formerly Sh) programming environment targeting multicore CPUs, AMD and NVIDIA GPUs and the Cell processor. The RapidMind Platform continues to be available, including support. In the medium term RapidMind’s technology and products will be integrated with Intel’s data-parallel products, in particular Intel’s Ct technology.

This blog entry by James Reinders from Intel describes the acquisition and future plans in more detail.