The performance of many math functions has improved with the release of the CUDA 4.0 Toolkit. This presentation includes the performance results of many of the key functions. Results include performance measurements for:
- cuFFT – Fast Fourier Transforms Library
- cuBLAS – Complete BLAS Library
- cuSPARSE – Sparse Matrix Library
- cuRAND – Random Number Generation (RNG) Library
- NPP – Performance Primitives for Image & Video Processing
- Thrust – Templated Parallel Algorithms & Data Structures
- math.h – C99 floating-point Library
A novel algorithm for solving in parallel a sparse triangular linear system on a graphical processing unit is proposed. It implements the solution of the triangular system in two phases. First, the analysis phase builds a dependency graph based on the matrix sparsity pattern and groups the independent rows into levels. Second, the solve phase obtains the full solution by iterating sequentially across the constructed levels. The solution elements corresponding to each single level are obtained at once in parallel. The numerical experiments are also presented and it is shown that the incomplete-LU and Cholesky preconditioned iterative methods, using the parallel sparse triangular solve algorithm, can achieve on average more than 2x speedup on graphical processing units (GPUs) over their CPU implementation.
(Maxim Naumov: “Parallel Solution of Sparse Triangular Linear Systems in the Preconditioned Iterative Methods on the GPU”, NVIDIA Technical Report, June 2011. [WWW])
In Silicon Valley? Interested in C++? Join in an evening with Microsoft & NVIDIA to discuss new C++ technology for parallel computing. Register here: http://vnextmsvc.eventbrite.com/
- 5:45 PM Welcome & Registration
- 6:00 PM Heterogeneous Parallelism in General, C++ in AMP in Particular, presented by Herb Sutter, Principal Architect for Windows C++, Microsoft
- 7:15 PM ALM tools for C++ in Visual Studio V.NEXT, presented by Rong Lu, Program Manager C++, Microsoft
- 8:00 PM The Power of Parallel, presented by the NVIDIA Team;
- Parallel Nsight: Programming GPUs in Visual Studio, Stephen Jones, NVIDIA;
- CUDA 4.0: Parallel Programming Made Easy, Justin Luitjens, NVIDIA;
- Thrust: C++ Template Library for GPGPUs, Jared Hoberock, NVIDIA
Simulators are still the primary tools for development and performance evaluation of applications running on massively parallel architectures. However, current virtual platforms are not able to tackle the complexity issues introduced by 1000-core future scenarios. We present a fast and accurate simulation framework targeting extremely large parallel systems by specifically taking advantage of the inherent potential processing parallelism available in modern GPGPUs.
(S. Raghav, M. Ruggiero, D. Atienza, C. Pinto, A. Marongiu and L. Benini: “Scalable instruction set simulator for thousand-core architectures running on GPGPUs”, Proceedings of High Performance Computing and Simulation (HPCS), pp.459-466, June/July 2010. [DOI] [WWW])
From a recent announcement:
Glare Technologies is proud to announce the release of Indigo Renderer 3.0 and Indigo RT. We use a hybrid GPU acceleration approach, which typically results in a 2-3x speedup when paired with a sufficiently powerful CPU. Realtime scene changes are possible, also in conjunction with network rendering to further accelerate rendering performance. A page outlining the other features and improvements of Indigo 3.0 and Indigo RT can be found at http://www.indigorenderer.com/indigo3 and http://www.indigorenderer.com/indigo_rt.
GPIUTMD stands for Graphic Processors at Isfahan University of Technology for Many-particle Dynamics. It performs general-purpose many-particle dynamic simulations on a single workstation, taking advantage of NVIDIA GPUs to attain a level of performance equivalent to thousands of cores on a fast cluster. Flexible and configurable, GPIUTMD is currently being used for all atom and coarse-grained molecular dynamics simulations of nano-materials, glasses, and surfactants; dissipative particle dynamics simulations (DPD) of polymers; and crystallization of metals using EAM potentials. GPIUTMD 0.9.6 adds many new features. Highlights include:
- Morse bond potential
- Adding constant acceleration to a group of particles. (useful for modeling gravity effects)
- Computes the full virial stress tensor (useful in mechanical characterization of materials)
- Long-ranged electrostatics via PPPM
- Support for CUDA 3.2
- Theory manual
- Up to twenty percent boost in simulations
- and more
A demo version of GPIUTMD 0.9.6 will be available soon for download under an open source license. Check out the quick start tutorial to get started, or check out the full documentation to see everything it can do.
Aiming at easing OpenCL development on Linux, Wendell Rodrigues has created wizards to start OpenCL application projects using the SDKs from NVIDiA, AMD or Intel, based on Anjuta DevStudio on Linux. Refer to his blog for details and downloads.
clpp is an OpenCL library of data-parallel algorithm primitives such as parallel prefix sum (“scan”), parallel sort and parallel reduction. Primitives such as these are important building blocks for a wide variety of data-parallel algorithms, including sorting, stream compaction, and building data structures such as trees and summed-area tables. For more information, visit http://code.google.com/p/clpp.
On June 28, 2011 StreamComputing will present a one-day course on OpenCL in Utrecht. The course covers general GPU computing principles and OpenCL specifics in a top-down fashion, including lectures and short lab sessions. Topics include:
Read the rest of this entry »
A new GPU users group is being established in South Africa. The first event will be held June 9, 2011. For more information, see http://www.meetup.com/GPGPU-ZA/