New PGI 9.0 Compilers Simplify x64+GPU Programming

June 24th, 2009

Yesterday The Portland Group announced the release of version 9.0 of its Fortran and C compilers with support for GPUs and x64 multi-core CPUs.  An introduction to PGI Accelerator Fortran and C programming is available online, as is the PGI Accelerator v1.0 specification. Evaluation copies of the new PGI 9.0 compilers are available from The Portland Group web site. Registration is required.

From the press release:

The use of Graphics Processing Units (GPUs) as general purpose accelerators has been a growing trend in high-performance computing (HPC). Until now, use of GPUs from Fortran applications has been extremely limited. Developers targeting GPU accelerators have had to program in C at a detailed level using sequences of function calls to manage movement of data between the x64 host and GPU, and to offload computations from the host to the GPU. The PGI Accelerator Fortran and C compilers automatically analyze whole program structure and data, split portions of an application between a multi-core x64 CPU and a GPU as specified by user directives, and define and generate a mapping of loops to automatically use the parallel cores, hardware threading capabilities and SIMD vector capabilities of modern GPUs.

Read the rest of this entry »

CUDA GPU Memtest

June 14th, 2009

CUDA GPU memtest is a memory test utility for NVIDIA GPU memory that uses well-established patterns from memtest86/memtest86+ as well as additional stress tests. The tests are designed to find hard and soft memory errors.

CUDA GPU memtest is  available via anonymous SVN from sourceforge and developed by Guochun Shi and Jeremy Enos.

R+CUDA: Enabling GPU Computing in the R Statistical Environment

June 14th, 2009

R is a popular open source environment for statistical computing, widely used in many application domains. The ongoing  R+GPU project is devoted to moving frequently used R functions, mostly functions used in biomedical research, to the GPU using CUDA. If a CUDA-compatible GPU and driver are present on a user’s machine, the user may only need to prefix “gpu” to the original function name to take advantage of the GPU implementation of the corresponding R function.

Speedup measurements of the current implementation range as high as 80x, and contributions to the code base are cordially invited. R+GPU is developed at the University of Michigan’s Molecular and Behavioral Neuroscience Institute

Three New Free NVIDIA CUDA Web Seminars

June 8th, 2009

NVIDIA is offering a series of free GPU computing webinars covering a range of topics from a basic introduction to the CUDA architecture to advanced topics such as data structure optimization and multi-GPU usage.

There are several webinars scheduled already; attendees are encouraged to pick the date and time which best suits their schedule. Visit the NVIDIA GPU Computing Online Seminars webpage for webinar registration and further information. Additional webinars will be scheduled throughout the next few months so check for future alerts and visit the NVIDIA online seminar schedule page often.

NVIDIA Announces Performance Primitives (NVPP) Library

June 8th, 2009

NVIDIA NVPP is a library of functions for performing CUDA accelerated processing. The initial set of functionality in the library focuses on imaging and video processing and is widely applicable for developers in these areas. NVPP will evolve over time to encompass more of the compute heavy tasks in a variety of problem domains. The NVPP library is written to maximize flexibility, while maintaining high performance.

NVPP can be used in one of two ways:

  • A stand-alone library for adding GPU acceleration to an application with minimal effort. Using this route allows developers to add GPU acceleration to their applications in a matter of hours.
  • A cooperative library for interoperating with a developer’s GPU code efficiently.

Either route allows developers to harness the massive compute resources of NVIDIA GPUs, while simultaneously reducing development times. The NVPP API matches the Intel Performance Primitives (IPP) library API so that porting existing IPP code to the GPU is easy to do.  For more information and to sign up for access to the beta release of NVPP, visit the NVPP website.

F2C-ACC: A source-to-source translator from Fortran to C and C for CUDA

June 4th, 2009

F2C-ACC is a language translator to convert codes from Fortran into C and C for CUDA. The goal of this project is to reduce the time to convert and adapt existing large-scale Fortran applications to run on CUDA-accelerated clusters, and to reduce the effort to maintain both Fortran and CUDA implementations. Both translations are useful: C can be used for testing and as a base code for running on the IBM Cell processor, and the generated C for CUDA code serves as a basis for running on the GPU. The current implementation does not support all language constructs yet, but the generated human-readable code can be used as a starting point for further manual adaptations and optimizations.

F2C-ACC is developed by Mark Govett et al. at the NOAA Earth System Research Laboratory, and has been presented at the Path to Petascale NCSA/UIUC workshop on applications for accelerators and accelerator clusters.

Thrust: a Template Library for CUDA Applications

May 31st, 2009

Thrust is an open-source template library for data parallel CUDA applications featuring an interface similar to the C++ Standard Template Library (STL). Thrust provides a flexible high-level interface for GPU programming that greatly enhances developer productivity while remaining high performance. Note that Thrust supersedes Komrade, the initial release of the library, and all future development will proceed under this title.

Thrust is open source under the Apache 2.0 license and available now at Download Thrust and check out the Thrust tutorial to get started.

The thrust::host_vector and thrust::device_vector containers simplify memory management and transfers between host and device. Thrust provides efficient algorithms for:

  • sorting – thrust::sort and thrust::sort_by_key
  • transformations – thrust::transform
  • reductions – thrust::reduce and thrust::transform_reduce
  • scans – thrust::inclusive_scan and thrust::transform_inclusive_scan
  • And many more!

Read the rest of this entry »

MemtestG80: A Memory and Logic Tester for NVIDIA CUDA-enabled GPUs

May 25th, 2009

MemtestG80 is a software-based tester to test for “soft errors” in GPU memory or logic for NVIDIA CUDA-enabled GPUs. It uses a variety of proven test patterns (some custom and some based on Memtest86) to verify the correct operation of GPU memory and logic. It is a useful tool to ensure that given GPUs do not produce “silent errors” which may corrupt the results of a computation without triggering an overt error.

Precompiled binaries for Windows, Linux and OSX, as well as the source code, are available for download under the LGPL license. MemtestG80 is developed by Imran Haque and Vijay Pande.

GPUmat: GPU toolbox for MATLAB

May 25th, 2009

GPUmat, developed by the GP-You Group, allows Matlab code to benefit from the compute power of modern GPUs. It is built on top of NVIDIA CUDA. The  acceleration is transparent to the user, only the declaration of variables needs to be changed using new GPU-specific keywords. Algorithms need not be changed. A wide range of standard Matlab functions have been implemented.  GPUmat is available as freeware for Windows and Linux from the GP-You download page.

University of Melbourne Workshop: High-Performance GPU Computing with NVIDIA CUDA

May 12th, 2009

A half-day workshop and discussion forum will be held from 8:45-13:00, Wednesday May 27, in Lecture theatre 3 of the Alan Gilbert Building at The University of Melbourne, Victoria, Australia. A  light lunch will be supplied afterwards from 13:00-14:00. With speakers from NVIDIA and Xenon Systems, this workshop is hosted by the ARC Centre of Excellence for Mathematics and Statistics of Complex Systems (MASCOS), and the Department of Mathematics and Statistics at the University of Melbourne.

Due to recent advances in GPU hardware and software, so called general-purpose GPU computing (GPGPU) is rapidly expanding from niche applications to the mainstream of high performance computing. For HPC researchers, hardware gains have increased the imperative to learn this new computing paradigm, while high level programming languages (in particular, CUDA) have decreased the barrier to entry to this field, so that it is now possible for new developers to rapidly port suitable applications from C/C++ running on CPUs to CUDA running on GPUs. For appropriate applications, GPUs have significant, even dramatic, advantages compared to CPUs in terms of both Dollars/FLOPS and Watts/FLOPS.

For more information see the workshop announcement.