Libra 1.2 includes new OpenCL back end

June 8th, 2010

GPU Systems LogoGPU Systems has added an OpenCL back end implementation to its Libra Technology compiler and runtime architecture. Libra version 1.2 now supports x86/x64, OpenGL/OpenCL and CUDA compute back ends. The OpenCL back end generates dynamic code specifically for AMD GPUs. Also, the CUDA back end generator has been enhanced with Fermi capabilities and this new release brings full BLAS 1,2,3 matrix, vector, dense, sparse, complex, single/double standard math library functionality and access through a standard C programming interface & library. The high-level approach of the Libra API enables developers to easily extend existing high-level functionality from their favorite programming language.

Read the rest of this entry »

CUDA 3.0 toolkit released

March 20th, 2010

NVIDIA has released version 3.0 of the CUDA Toolkit, providing developers with tools to prepare for the upcoming Fermi-based GPUs. Highlights of this release include:

  • Support for the new Fermi architecture, with:
    • Native 64-bit GPU support
    • Multiple Copy Engine support
    • ECC reporting
    • Concurrent Kernel Execution
    • Fermi HW debugging support in cuda-gdb
    • Fermi HW profiling support for CUDA C and OpenCL in Visual Profiler
  • C++ Class Inheritance and Template Inheritance support for increased programmer productivity
  • A new unified interoperability API for Direct3D and OpenGL, with support for:
    • OpenGL texture interop
    • Direct3D 11 interop support
    • CUDA Driver / Runtime Buffer Interoperability, which allows applications using the CUDA Driver API to also use libraries implemented using the CUDA C Runtime such as CUFFT and CUBLAS.
  • Read the rest of this entry »

Intel acquires RapidMind

August 23rd, 2009

Intel has acquired RapidMind, the company behind the RapidMind (formerly Sh) programming environment targeting multicore CPUs, AMD and NVIDIA GPUs and the Cell processor. The RapidMind Platform continues to be available, including support. In the medium term RapidMind’s technology and products will be integrated with Intel’s data-parallel products, in particular Intel’s Ct technology.

This blog entry by James Reinders from Intel describes the acquisition and future plans in more detail.

Equalizer 0.9

August 17th, 2009

Equalizer 0.9, a framework for creating and deploying parallel, scalable OpenGL applications, has been released. The most notable new features in this release are:

  • Automatic cross-segment load-balancing for multidisplay installations
  • Dynamic Frame Resolution (DFR) for constant-framerate rendering
  • Compression Plugin API for runtime-loadable image compression engines

See the 0.9 release notes on the Equalizer website for a comprehensive list of new features, enhancements, optimizations and bug. A paperback Equalizer Programming and User Guide is available from Lulu.com. Commercial support, custom software development and porting services are available from Eyescale Software GmbH.

NVIDIA CUDA Toolkit and SDK version 2.3 Released

July 22nd, 2009

NVIDIA announced today it has released version 2.3 of the CUDA Toolkit and SDK for GPU Computing. This latest release supports several significant new features that deliver a major leap forward in getting the most performance out of NVIDIA’s massively parallel CUDA-enabled GPUs. This release of the CUDA Toolkit includes performance improvements and expanded support for the cuda-gdb hardware debugger.

Additional new features in CUDA Toolkit 2.3 include:

  • The CUFFT Library now supports double-precision transforms and includes significant performance improvements for single-precision transforms as well.  See the CUDA Toolkit release notes for details.
  • The CUDA-GDB hardware debugger and CUDA Visual Profiler are now included in the CUDA Toolkit installer, and the CUDA-GDB debugger is now available for all supported Linux distros.  (see below)
  • Each GPU in an SLI group is now enumerated individually, so compute applications can now take advantage of multi-GPU performance even when SLI is enabled for graphics.
  • The 64-bit versions of the CUDA Toolkit now support compiling 32-bit applications. (See the release notes for details, including changes to LD_LIBRARY_PATH on Linux)
  • New support for fp16 <-> fp32 conversion intrinsics allows storage of data in fp16 format with computation in fp32.  Use of fp16 format is ideal for applications that require higher numerical range than 16-bit integer but less precision than fp32 and reduces memory space and bandwidth consumption.
  • The CUDA SDK has been updated to include: Read the rest of this entry »

Libra SDK: C/C++ for both the CPU and GPU

June 24th, 2009

GPU Systems has announced the Libra SDK, a robustly equipped C/C++ developer kit for fast and easy cross CPU-GPU access suited for scientific computations. The Libra 1.1 SDK includes a C/C++ Matlab-style API, sample programs and documentation. A downloadable trial version of Libra is available from the GPU Systems website, and a Libra demo presentation is also available.

Message Passing on GPUs and Data-Parallel Architectures

March 11th, 2009

Abstract:

This paper explores the challenges in implementing a message passing interface usable on systems with data-parallel processors. As a case study, we design and implement the “DCGN” API on NVIDIA GPUs that is similar to MPI and allows full access to the underlying architecture. We introduce the notion of data-parallel thread-groups as a way to map resources to MPI ranks. We use a method that also allows the data-parallel processors to run autonomously from user-written CPU code. In order to facilitate communication, we use a sleep-based polling system to store and retrieve messages. Unlike previous systems, our method provides both performance and flexibility. By running a test suite of applications with different communication requirements, we find that a tolerable amount of overhead is incurred, somewhere between one and five percent depending on the application, and indicate the locations where this overhead accumulates. We conclude that with innovations in chipsets and drivers, this overhead will be mitigated and provide similar performance to typical CPU based MPI implementations while providing fully-dynamic communication.

(Jeff A. Stuart and John D. Owens, Message Passing on Data-Parallel Architectures, Proceedings of the 23rd IEEE International Parallel and Distributed Processing Symposium)

GPU Programming For The Rest Of Us

March 11th, 2009

This article by Jeff Layton at ClusterMonkey summarizes the history of GPU Computing in terms of high-level programming languages and abstractions, from the early days of GPGPU programming using graphics APIs, to Stream, CUDA and OpenCL. The second half of the article provides an introduction to the PGI 8.0 Technology Preview, which allows the use of pragmas to automatically parallelize and run compute-intensive kernels in standard C and Fortran code on accelerators like GPUs. (GPU Programming For the Rest Of Us, Jeff Layton, ClusterMonkey.net)

CUDA.NET 2.1 Released

February 27th, 2009

CUDA.NET 2.1 has been released with support for the NVIDIA CUDA 2.1 API. This version supports DirectX 10 interoperability and the new JIT compilation API. The library is supported on Windows and Linux operating systems. (CUDA.NET)

NVIDIA Releases Version 2.1 Beta of the CUDA Toolkit and SDK

December 23rd, 2008

DECEMBER 19, 2008- NVIDIA has announced the availability of version 2.1 beta of its CUDA toolkit and SDK. This is the latest version of the C-compiler and software development tools for accessing the massively parallel CUDA compute architecture of NVIDIA GPUs. In response to overwhelming demand from the developer community, this latest version of the CUDA software suite includes support for NVIDIA®® Tesla™ GPUs on Windows Vista and 32-bit debugger support for CUDA on RedHat Enterprise Linux 5.x (separate download).

The CUDA Toolkit and SDK 2.1 beta includes support for VisualStudio 2008 support on Windows XP and Vista and Just-In-Time (JIT) compilation for applications that dynamically generate CUDA kernels. Several new interoperability APIs have been added for Direct3D 9 and Direct3D 10 that accelerate communication to DirectX applications as well as a series of improvements to OpenGL interoperability.

CUDA Toolkit and SDK 2.1 beta also features support for using a GPU that is not driving a display on Vista, a beta of Linux Profiler 1.1 (separate download) as well as support for recent releases of Linux including Fedora9, OpenSUSE 11 and Ubuntu 8.04.

CUDA Toolkit and SDK 2.1 beta is available today for free download from www.nvidia.com/object/cuda_get.

Page 1 of 512345