You are here: Home » Archives for Libraries
September 7th, 2009
OpenMM is an open-source library that enables molecular dynamics (MD) simulations to be accelerated on high performance computer architectures, such as GPUs. This latest release adds support for:
- A complete set of C and Fortran wrappers
- Energy computations on GPUs
- Ewald summation
- A faster algorithm for handling constraints
- And more!
Download the latest version of OpenMM from http://simtk.org/home/openmm.
Posted in Developer Resources, Research | Tags: Computational Chemistry, Libraries, Molecular Dynamics, Open Source | Write a comment
August 23rd, 2009
EM Photonics has recently released a preview beta edition of their CULAtools, an implementation of LAPACK for CUDA-enabled GPUs. This version comprises single precision LU decomposition, QR factorization, singular value decomposition and least squares. The full library, scheduled for release at NVIDIA GTC ’09, will contain much more functionality and in particular single- and double-precision computations. Please refer to the website culatools.com for details, licenses and downloads.
Posted in Business, Developer Resources | Tags: Libraries, Linear Algebra, Numerics, NVIDIA CUDA | Write a comment
August 6th, 2009
The MAGMA project aims to develop a dense linear algebra library similar to LAPACK but for heterogeneous/hybrid architectures, starting with current “Multicore+GPU” systems.
The MAGMA research is based on the idea that, to address the complex challenges of the emerging hybrid environments, optimal software solutions will themselves have to hybridized, combining the strengths of different algorithms within a single framework. Building on this idea, the MAGMA group aims to design linear algebra algorithms and frameworks for hybrid manycore and GPU systems that can enable applications to fully exploit the power that each of the hybrid components offers.
MAGMA v0.1 runs on CUDA-capable GPUs and multicore CPUs, and is available now.
Posted in Developer Resources, Research | Tags: Libraries, Linear Algebra, NVIDIA CUDA | Write a comment
July 30th, 2009
Ocelot, developed at Georgia Tech, seeks to develop a set of tools that enable the low level analysis of GPGPU applications as well a providing a JIT compiler for generic architectures. Ocelot currently provides an implementation of the NVIDIA CUDA runtime, capable of running the entire CUDA 2.2 and 2.1 SDKs.
Ocelot features include a memory checker similar to valgrind, detection mechanisms for non-coalesced memory accesses, full device emulation, and a number of useful debugging and performance tuning features. The Roadmap lists future developments.
Ocelot is available at google code, and a number of papers have been published.
Posted in Developer Resources, Research | Tags: Debugging, Libraries, NVIDIA CUDA, Open Source, Papers | Write a comment
July 7th, 2009
Sparse Matrix-Vector Multiplication Toolkit for Graphics Processing Units (SpMV4GPU) is a library optimized for NVIDIA Graphics Processing Units (GPUs). The GPU is fast emerging as the ideal architecture to use as an accelerator in a heterogenous computing environment. Modern GPUs are designed not only for accelerating traditional graphics kernels, but also for general-purpose computationally intensive kernels. The state-of-the art GPUs exhibit very high computational capabilities at a reasonable price.
Sparse Matrix-Vector Multiplication is a core numerical analysis kernel used for a wide range of application domains, such as graphics, data mining, and image processing. SpMV4GPU is a sparse matrix-vector multiplication library optimized for the NVIDIA GPUs. It is developed using the NVIDIA C for CUDA language and API, and works on all NVIDIA GPUs with CUDA support. SpMV4GPU uses the standard sparse matrix storage formats, such as compressed row and column storage formats. It hides the intricacies of GPU programming by using an abstract interface. The SpMV4GPU interface also allows users to provide optional performance hints, and optionally use special storage representations. Experimental evaluation demonstrate that the SpMV library provides two to four times improvement over the equivalent solution provided by the NVIDIA’s CUDPP library.
Along with the library, there is an IBM Research technical paper by Muthu Manikandan Baskaran andRajesh Bordawekar available, “Optimizing Sparse Matrix-Vector Multiplication on GPUs“. (Muthu Manikandan Baskaran and Rajesh Bordawekar, “Optimizing Sparse Matrix-Vector Multiplication on GPUs“. IBM Research Technical Paper RC24704, 2008.)
Posted in Developer Resources, Research | Tags: Libraries, Linear Algebra, NVIDIA CUDA, Sparse Linear Systems | Write a comment
July 1st, 2009
Release 1.1 of the CUDA Data-Parallel Primitives Library (CUDPP) is now available for download. The two major new features in CUDPP 1.1 are a very fast new radix sort implementation with support for sorting key-value pairs (with float or unsigned integer keys); and a new pseudorandom number generator, cudppRand. CUDPP 1.1 also replaces its former custom license with the standard BSD license. This greatly simplifies the CUDPP license details, and it also enables CUDPP to move into a public source repository such as Google Code in the near future. For more information, visit the CUDPP Website.
Posted in Developer Resources, Research | Tags: CUDPP, Data-Parallel, Libraries, NVIDIA CUDA, Open Source | 2 Comments
June 26th, 2009
Abstract:
This paper reports on CuPP, our newly developed C++ framework designed to ease integration of NVIDIA’s GPGPU system, CUDA, into existing C++ applications. CuPP provides interfaces to reoccurring tasks that are easier to use than the standard CUDA interfaces. In this paper we concentrate on memory management and related data structures. CuPP offers both a low level interface — mostly consisting of smart pointers and memory allocation functions for GPU memory — and a high level interface offering a C++ STL vector wrapper and the so-called type transformations. The wrapper can be used by both device and host to automatically keep data in sync. The type transformations allow developers to write their own data structures offering the same functionality as the CuPP vector, in case a vector does not conform to the need of the application. Furthermore the type transformations offer a way to have two different representations for the same data at host and device, respectively. We demonstrate the benefits of using CuPP by integrating it into an example application, the open-source steering library OpenSteer. In particular, for this application we develop a uniform grid data structure to solve the k-nearest neighbor problem that deploys the type transformations. The paper finishes with a brief outline of another CUDA application, the Einstein@Home client, which also requires data structure redesign and thus may benefit from the type transformations and future work on CuPP.
(Jens Breitbart: CuPP – A framework for easy CUDA integration, HiPS 2009 workshop with IPDPS 2009, Rome, Italy, May 2009)
Posted in Developer Resources, Research | Tags: Libraries, NVIDIA CUDA, Papers | Write a comment
June 24th, 2009
This NVIDIA technical report by Sengupta, Harris, and Garland describes the design of new parallel algorithms for scan and segmented scan on GPUs. This paper describes the primitives included in the latest release of the CUDPP library.
Abstract:
Scan and segmented scan algorithms are crucial building blocks for a great many data-parallel algorithms. Segmented scan and related primitives also provide the necessary support for the flattening transform, which allows for nested data-parallel programs to be compiled into flat data-parallel languages. In this paper, we describe the design of efficient scan and segmented scan parallel primitives in CUDA for execution on GPUs. Our algorithms are designed using a divide-and-conquer approach that builds all scan primitives on top of a set of primitive intra-warp scan routines. We demonstrate that this design methodology results in routines that are simple, highly efficient, and free of irregular access patterns that lead to memory bank conflicts. These algorithms form the basis for current and upcoming releases of the widely used CUDPP library.
(S. Sengupta, M. Harris, and M. Garland. Efficient parallel scan algorithms for GPUs. NVIDIA Technical Report NVR-2008-003, December 2008)
Posted in Research | Tags: CUDPP, Data-Parallel, Libraries, NVIDIA CUDA, Papers, Parallel Algorithms | Write a comment
June 8th, 2009
NVIDIA NVPP is a library of functions for performing CUDA accelerated processing. The initial set of functionality in the library focuses on imaging and video processing and is widely applicable for developers in these areas. NVPP will evolve over time to encompass more of the compute heavy tasks in a variety of problem domains. The NVPP library is written to maximize flexibility, while maintaining high performance.
NVPP can be used in one of two ways:
- A stand-alone library for adding GPU acceleration to an application with minimal effort. Using this route allows developers to add GPU acceleration to their applications in a matter of hours.
- A cooperative library for interoperating with a developer’s GPU code efficiently.
Either route allows developers to harness the massive compute resources of NVIDIA GPUs, while simultaneously reducing development times. The NVPP API matches the Intel Performance Primitives (IPP) library API so that porting existing IPP code to the GPU is easy to do. For more information and to sign up for access to the beta release of NVPP, visit the NVPP website.
Posted in Developer Resources | Tags: Image Processing, Intel, Libraries, NVIDIA CUDA, NVPP, Performance Primitives, Video Processing | Write a comment
May 31st, 2009
Thrust is an open-source template library for data parallel CUDA applications featuring an interface similar to the C++ Standard Template Library (STL). Thrust provides a flexible high-level interface for GPU programming that greatly enhances developer productivity while remaining high performance. Note that Thrust supersedes Komrade, the initial release of the library, and all future development will proceed under this title.
Thrust is open source under the Apache 2.0 license and available now at http://thrust.googlecode.com. Download Thrust and check out the Thrust tutorial to get started.
The thrust::host_vector and thrust::device_vector containers simplify memory management and transfers between host and device. Thrust provides efficient algorithms for:
- sorting – thrust::sort and thrust::sort_by_key
- transformations – thrust::transform
- reductions – thrust::reduce and thrust::transform_reduce
- scans – thrust::inclusive_scan and thrust::transform_inclusive_scan
- And many more!
Read the rest of this entry »
Posted in Developer Resources | Tags: C/C++, Libraries, NVIDIA CUDA | Write a comment