Intel acquires RapidMind

August 23rd, 2009

Intel has acquired RapidMind, the company behind the RapidMind (formerly Sh) programming environment targeting multicore CPUs, AMD and NVIDIA GPUs and the Cell processor. The RapidMind Platform continues to be available, including support. In the medium term RapidMind’s technology and products will be integrated with Intel’s data-parallel products, in particular Intel’s Ct technology.

This blog entry by James Reinders from Intel describes the acquisition and future plans in more detail.

Equalizer 0.9

August 17th, 2009

Equalizer 0.9, a framework for creating and deploying parallel, scalable OpenGL applications, has been released. The most notable new features in this release are:

  • Automatic cross-segment load-balancing for multidisplay installations
  • Dynamic Frame Resolution (DFR) for constant-framerate rendering
  • Compression Plugin API for runtime-loadable image compression engines

See the 0.9 release notes on the Equalizer website for a comprehensive list of new features, enhancements, optimizations and bug. A paperback Equalizer Programming and User Guide is available from Lulu.com. Commercial support, custom software development and porting services are available from Eyescale Software GmbH.

NVIDIA CUDA Toolkit and SDK version 2.3 Released

July 22nd, 2009

NVIDIA announced today it has released version 2.3 of the CUDA Toolkit and SDK for GPU Computing. This latest release supports several significant new features that deliver a major leap forward in getting the most performance out of NVIDIA’s massively parallel CUDA-enabled GPUs. This release of the CUDA Toolkit includes performance improvements and expanded support for the cuda-gdb hardware debugger.

Additional new features in CUDA Toolkit 2.3 include:

  • The CUFFT Library now supports double-precision transforms and includes significant performance improvements for single-precision transforms as well.  See the CUDA Toolkit release notes for details.
  • The CUDA-GDB hardware debugger and CUDA Visual Profiler are now included in the CUDA Toolkit installer, and the CUDA-GDB debugger is now available for all supported Linux distros.  (see below)
  • Each GPU in an SLI group is now enumerated individually, so compute applications can now take advantage of multi-GPU performance even when SLI is enabled for graphics.
  • The 64-bit versions of the CUDA Toolkit now support compiling 32-bit applications. (See the release notes for details, including changes to LD_LIBRARY_PATH on Linux)
  • New support for fp16 <-> fp32 conversion intrinsics allows storage of data in fp16 format with computation in fp32.  Use of fp16 format is ideal for applications that require higher numerical range than 16-bit integer but less precision than fp32 and reduces memory space and bandwidth consumption.
  • The CUDA SDK has been updated to include: Read the rest of this entry »

Libra SDK: C/C++ for both the CPU and GPU

June 24th, 2009

GPU Systems has announced the Libra SDK, a robustly equipped C/C++ developer kit for fast and easy cross CPU-GPU access suited for scientific computations. The Libra 1.1 SDK includes a C/C++ Matlab-style API, sample programs and documentation. A downloadable trial version of Libra is available from the GPU Systems website, and a Libra demo presentation is also available.

Message Passing on GPUs and Data-Parallel Architectures

March 11th, 2009

Abstract:

This paper explores the challenges in implementing a message passing interface usable on systems with data-parallel processors. As a case study, we design and implement the “DCGN” API on NVIDIA GPUs that is similar to MPI and allows full access to the underlying architecture. We introduce the notion of data-parallel thread-groups as a way to map resources to MPI ranks. We use a method that also allows the data-parallel processors to run autonomously from user-written CPU code. In order to facilitate communication, we use a sleep-based polling system to store and retrieve messages. Unlike previous systems, our method provides both performance and flexibility. By running a test suite of applications with different communication requirements, we find that a tolerable amount of overhead is incurred, somewhere between one and five percent depending on the application, and indicate the locations where this overhead accumulates. We conclude that with innovations in chipsets and drivers, this overhead will be mitigated and provide similar performance to typical CPU based MPI implementations while providing fully-dynamic communication.

(Jeff A. Stuart and John D. Owens, Message Passing on Data-Parallel Architectures, Proceedings of the 23rd IEEE International Parallel and Distributed Processing Symposium)

GPU Programming For The Rest Of Us

March 11th, 2009

This article by Jeff Layton at ClusterMonkey summarizes the history of GPU Computing in terms of high-level programming languages and abstractions, from the early days of GPGPU programming using graphics APIs, to Stream, CUDA and OpenCL. The second half of the article provides an introduction to the PGI 8.0 Technology Preview, which allows the use of pragmas to automatically parallelize and run compute-intensive kernels in standard C and Fortran code on accelerators like GPUs. (GPU Programming For the Rest Of Us, Jeff Layton, ClusterMonkey.net)

CUDA.NET 2.1 Released

February 27th, 2009

CUDA.NET 2.1 has been released with support for the NVIDIA CUDA 2.1 API. This version supports DirectX 10 interoperability and the new JIT compilation API. The library is supported on Windows and Linux operating systems. (CUDA.NET)

NVIDIA Releases Version 2.1 Beta of the CUDA Toolkit and SDK

December 23rd, 2008

DECEMBER 19, 2008- NVIDIA has announced the availability of version 2.1 beta of its CUDA toolkit and SDK. This is the latest version of the C-compiler and software development tools for accessing the massively parallel CUDA compute architecture of NVIDIA GPUs. In response to overwhelming demand from the developer community, this latest version of the CUDA software suite includes support for NVIDIA®® Tesla™ GPUs on Windows Vista and 32-bit debugger support for CUDA on RedHat Enterprise Linux 5.x (separate download).

The CUDA Toolkit and SDK 2.1 beta includes support for VisualStudio 2008 support on Windows XP and Vista and Just-In-Time (JIT) compilation for applications that dynamically generate CUDA kernels. Several new interoperability APIs have been added for Direct3D 9 and Direct3D 10 that accelerate communication to DirectX applications as well as a series of improvements to OpenGL interoperability.

CUDA Toolkit and SDK 2.1 beta also features support for using a GPU that is not driving a display on Vista, a beta of Linux Profiler 1.1 (separate download) as well as support for recent releases of Linux including Fedora9, OpenSUSE 11 and Ubuntu 8.04.

CUDA Toolkit and SDK 2.1 beta is available today for free download from www.nvidia.com/object/cuda_get.

The Khronos Group Releases OpenCL 1.0 Specification

December 9th, 2008

The Khronos™ Group today announced the ratification and public release of the OpenCL™ 1.0 specification, the first open, royalty-free standard for cross-platform, parallel programming of modern processors found in personal computers, servers and handheld/embedded devices. OpenCL (Open Computing Language) greatly improves speed and responsiveness for a wide spectrum of applications in numerous market categories from gaming and entertainment to scientific and medical software. Proposed six months ago as a draft specification by Apple, OpenCL has been developed and ratified by industry-leading companies including 3DLABS, Activision Blizzard, AMD, Apple, ARM, Barco, Broadcom, Codeplay, Electronic Arts, Ericsson, Freescale, HI, IBM, Intel Corporation, Imagination Technologies, Kestrel Institute, Motorola, Movidia, Nokia, NVIDIA, QNX, RapidMind, Samsung, Seaweed, TAKUMI, Texas Instruments and Umeå University. The OpenCL 1.0 specification and more details are available at http://www.khronos.org/opencl/

At Khronos “Developer University” today at SIGGRAPH Asia in Singapore, Khronos members publicly launched OpenCL 1.0 with a presentation of the specification and source code examples.

OpenCL launched at SC08

November 19th, 2008

A launch event was held Monday night at Austin’s Rio Grande Mexican Restaurant in conjuntion with Supercomputing 2008, to celebrate the newly completed OpenCL specification.  No live demos of OpenCL applications were shown because the OpenCL spec must first be ratified by by all members of the Khronos Group before it can be publicly released.  Still, the fact that this group has completed the complex specification in less than six months is nothing less than amazing.  Macworld has posted an article discussing the event including interviews with members of the OpenCL working group.  More information about OpenCL is available at the Khronos Group Website.

Page 1 of 512345