You are here: Home » Archives for NVIDIA CUDA
June 26th, 2011
CUDA Template Generator is a Java application that allows generates CUDA C source file templates based on user input parameters. Features include :
- An algorithm for automatic block and thread definition, depending on array size.
- Automatic memory transfer functions for CPU->GPU->CPU communication.
- Generated C source code function template to use in your application.
Developed by Pavel Kartashev, as part of his Master’s Degree work.
Posted in Developer Resources | Tags: Java, NVIDIA CUDA | Write a comment
June 26th, 2011
The performance of many math functions has improved with the release of the CUDA 4.0 Toolkit. This presentation includes the performance results of many of the key functions. Results include performance measurements for:
- cuFFT – Fast Fourier Transforms Library
- cuBLAS – Complete BLAS Library
- cuSPARSE – Sparse Matrix Library
- cuRAND – Random Number Generation (RNG) Library
- NPP – Performance Primitives for Image & Video Processing
- Thrust – Templated Parallel Algorithms & Data Structures
- math.h – C99 floating-point Library
Posted in Developer Resources | Tags: Libraries, NVIDIA CUDA | Write a comment
June 26th, 2011
Abstract:
A novel algorithm for solving in parallel a sparse triangular linear system on a graphical processing unit is proposed. It implements the solution of the triangular system in two phases. First, the analysis phase builds a dependency graph based on the matrix sparsity pattern and groups the independent rows into levels. Second, the solve phase obtains the full solution by iterating sequentially across the constructed levels. The solution elements corresponding to each single level are obtained at once in parallel. The numerical experiments are also presented and it is shown that the incomplete-LU and Cholesky preconditioned iterative methods, using the parallel sparse triangular solve algorithm, can achieve on average more than 2x speedup on graphical processing units (GPUs) over their CPU implementation.
(Maxim Naumov: “Parallel Solution of Sparse Triangular Linear Systems in the Preconditioned Iterative Methods on the GPU”, NVIDIA Technical Report, June 2011. [WWW])
Posted in Developer Resources, Research | Tags: Numerical Algorithms, NVIDIA CUDA, Papers, Sparse Linear Systems | Write a comment
June 26th, 2011
GPIUTMD stands for Graphic Processors at Isfahan University of Technology for Many-particle Dynamics. It performs general-purpose many-particle dynamic simulations on a single workstation, taking advantage of NVIDIA GPUs to attain a level of performance equivalent to thousands of cores on a fast cluster. Flexible and configurable, GPIUTMD is currently being used for all atom and coarse-grained molecular dynamics simulations of nano-materials, glasses, and surfactants; dissipative particle dynamics simulations (DPD) of polymers; and crystallization of metals using EAM potentials.
GPIUTMD 0.9.6 adds many new features. Highlights include:
- Morse bond potential
- Adding constant acceleration to a group of particles. (useful for modeling gravity effects)
- Computes the full virial stress tensor (useful in mechanical characterization of materials)
- Long-ranged electrostatics via PPPM
- Support for CUDA 3.2
- Theory manual
- Up to twenty percent boost in simulations
- and more
A demo version of GPIUTMD 0.9.6 will be available soon for download under an open source license. Check out the quick start tutorial to get started, or check out the full documentation to see everything it can do.
Posted in Developer Resources, Research | Tags: Molecular Dynamics, NVIDIA CUDA, Open Source, Particle Systems, Physics Simulation, Scientific Computing | Write a comment
June 14th, 2011
Abstract:
The calculation of radial distribution functions (RDFs) from molecular dynamics trajectory data is a common and computationally expensive analysis task. The rate limiting step in the calculation of the RDF is building a histogram of the distance between atom pairs in each trajectory frame. Here we present an implementation of this histogramming scheme for multiple graphics processing units (GPUs). The algorithm features a tiling scheme to maximize the reuse of data at the fastest levels of the GPU’s memory hierarchy and dynamic load balancing to allow high performance on heterogeneous configurations of GPUs. Several versions of the RDF algorithm are presented, utilizing the specific hardware features found on different generations of GPUs. We take advantage of larger shared memory and atomic memory operations available on state-of-the-art GPUs to accelerate the code significantly. The use of atomic memory operations allows the fast, limited-capacity on-chip memory to be used much more efficiently, resulting in a fivefold increase in performance compared to the version of the algorithm without atomic operations. The ultimate version of the algorithm running in parallel on four NVIDIA GeForce GTX 480 (Fermi) GPUs was found to be 92 times faster than a multithreaded implementation running on an Intel Xeon 5550 CPU. On this multi-GPU hardware, the RDF between two selections of 1,000,000 atoms each can be calculated in 26.9 s per frame. The multi-GPU RDF algorithms described here are implemented in VMD, a widely used and freely available software package for molecular dynamics visualization and analysis.
(Benjamin G. Levine, John E. Stone, and Axel Kohlmeyer: “Fast Analysis of Molecular Dynamics Trajectories with Graphics Processing Units — Radial Distribution Function Histogramming”, Journal of Computational Physics, 230(9):3556-3569, 2011. [DOI: 10.1016/j.jcp.2011.01.048])
Posted in Research | Tags: hist, Molecular Dynamics, NVIDIA CUDA, Papers | 1 Comment
May 29th, 2011
A 2 day CUDA workshop will be held in Berlin from July 2-3, for developers who want to learn how to program and utilize the Graphics Processing Unit (GPU) using NVIDIA’s CUDA programming framework. No prior knowledge of parallel computing concepts is necessary, but some basic C/C++ knowledge will be required. More information is available at http://cuda.eventbrite.com.
Posted in Events | Tags: NVIDIA CUDA, OpenCL, Workshops | Write a comment
May 11th, 2011
Alenka is a columnar SQL-like language for data processing on CUDA hardware. Alenka uses vector based processing to perform SQL operations like joins, groups and sorts. The program is capable of processing very large data sets that do not fit into GPU or host memory: such sets are partitioned into pieces and processed separately. Get it here: https://sourceforge.net/projects/alenka/files/
Posted in Developer Resources | Tags: Databases, NVIDIA CUDA, Open Source | Write a comment
May 4th, 2011
SGC Ruby CUDA has been heavily updated. It is now available from the standard Ruby Gems repository. Updates include:
- Basic CUDA Driver and Runtime API support on CUDA 4.0rc2 with unit tests.
- Object-Oriented API.
- Exception classes for CUDA errors.
- Support for Linux and Mac OSX platforms.
- Documented with YARD.
See http://blog.speedgocomputing.com/2011/04/first-release-of-sgc-ruby-cuda.html for more details.
Posted in Developer Resources | Tags: NVIDIA CUDA, Programming Languages, Ruby | Write a comment
May 4th, 2011
Abstract:
This paper proposes a new sparse matrix storage format which allows an efficient implementation of a sparse matrix vector product on a Fermi Graphics Processing Unit (GPU). Unlike previous formats it has both low memory footprint and good throughput. The new format, which we call Sliced ELLR-T has been designed specifically for accelerating the iterative solution of a large sparse and complex-valued system of linear equations arising in computational electromagnetics. Numerical tests have shown that the performance of the new implementation reaches 69 GFLOPS in complex single precision arithmetic. Compared to the optimized six core Central Processing Unit (CPU) (Intel Xeon 5680) this performance implies a speedup by a factor of six. In terms of speed the new format is as fast as the best format published so far and at the same time it does not introduce redundant zero elements which have to be stored to ensure fast memory access. Compared to previously published solutions, significantly larger problems can be handled using low cost commodity GPUs with limited amount of on-board memory.
(A. Dziekonski, A. Lamecki, and M. Mrozowski: “A memory efficient and fast sparse matrix vector product on a GPU“, Progress In Electromagnetics Research, Vol. 116, 49-63, 2011. [PDF])
Posted in Research | Tags: Linear Algebra, NVIDIA CUDA, Papers, Physics Simulation, Scientific Computing | Write a comment
May 4th, 2011
KGPU is a GPU computing framework for the Linux kernel. It allows the Linux kernel to directly execute CUDA programs running on GPUs. The motivation is to augment systems with GPUs so that like user-space applications, the operating system itself can benefit from the GPU acceleration. It can also offload computationally intensive work from the CPU by enabling the GPU as an extra computing device.
The current KGPU release includes a demo task with GPU augmentation: a GPU AES cipher based eCryptfs, which is an encrypted file system on Linux. The read /write bandwidths are expected to be accelerated by a factor of 1.7 ~ 2.5 on an NVIDIA GeForce GTX 480 GPU.
The source code can be obtained from https://github.com/wbsun/kgpu, and news and release information can be found at http://code.google.com/p/kgpu/.
Posted in Developer Resources, Research | Tags: Linux, NVIDIA CUDA, Open Source, Operating Systems | 2 Comments