Two leading computing visionaries will speak at the GPU Technology Conference (GTC) in September. Prof. Klaus Schulten, renowned computational biologist from the University of Illinois at Urbana-Champaign, will deliver a keynote highlighting discoveries made using the ‘computational microscope.’ Prof. Sebastian Thrun, robotics pioneer at Stanford University and distinguished engineer at Google, will speak on advances in GPU computing in computer vision and robotics. Registration is still open at www.nvidia.com/gtc.
GPU Tech Conference Keynotes Announced
August 28th, 2010Back 40 Computing: High Performance GPU Building Blocks
August 22nd, 2010The Back 40 Computing project aims at providing a collection of high performance GPU computing building blocks. It is maintained by Duane Merrill from the University of Virginia. Highlights of the current release include the fastest Radix Sort implementation on GPUs to date, capable of sorting over 1 billion keys per second. For more details you can also see this (pre-Fermi) Techreport (direct PDF link).
Source code and documentation are available on Google Code.
Free Workshop in Perth, Australia: High Performance GPU Computing with NVIDIA CUDA, and Fermi
August 4th, 2010In this workshop hosted by iVEC and the University of Western Australia on August 19th, you will learn about CUDA, the Fermi architecture, and Tesla GPU Computing products. You will learn about the basics of programming GPUs using CUDA C and C++, the variety of available computational libraries for CUDA, tools for profiling and debugging CUDA applications, and approaches for optimizing CUDA parallel applications. You will also learn about CUDA-enabled desktop, workstation, and cluster computing solutions provided by Xenon Systems. The workshop will also include presentations on some of the ways these technologies are being used by researchers in Western Australia. Full details including speakers and agenda here (PDF).
CUVI Lib – CUDA for Vision and Imaging Library Launched
August 1st, 2010
TunaCode has announced the release of CUVI Lib v0.3 (Beta version) for Windows 32 and 64 Systems. A copy can be downloaded from http://www.cuvilib.com/downloads.
CUVI Lib (CUDA for Vision and Imaging Lib) is an add-on library for NPP (NVIDIA Performance Primitives) and includes several advanced computer vision and image processing functions presently not available in NPP. This version of CUVI Lib supports, among others:
- Optical Flow (Horn & Shunck)
- Optical Flow (Lucas & Kanade)
- Discrete Wavelet Transform (Forward and Inverse)
- Hough Transform
- Hough Lines (Lines Detector)
- Color Conversion (RGB-to-gray and RGBA-to-Gray)
Several more advanced features will be added to CUVI Lib in upcoming releases. A detailed function reference can be downloaded here. Forums to discuss feedback and further ideas are available.
Multi-GPU accelerated multi-spin Monte Carlo simulations of the 2D Ising model
August 1st, 2010Abstract:
A Modern Graphics Processing unit (GPU) is able to perform massively parallel scientific computations at low cost. We extend our implementation of the checkerboard algorithm for the two-dimensional Ising model in order to overcome the memory limitations of a single GPU which enables us to simulate significantly larger systems. Using multi-spin coding techniques, we are able to accelerate simulations on a single GPU by factors up to 35 compared to an optimized single Central Processor Unit (CPU) core implementation which employs multi-spin coding. By combining the Compute Unified Device Architecture (CUDA) with the Message Parsing Interface (MPI) on the CPU level, a single Ising lattice can be updated by a cluster of GPUs in parallel. For large systems, the computation time scales nearly linearly with the number of GPUs used. As proof of concept we reproduce the critical temperature of the 2D Ising model using finite size scaling techniques.
(Benjamin Block, Peter Virnau and Tobias Preis: “Multi-GPU accelerated multi-spin Monte Carlo simulations of the 2D Ising model”, Computer Physics Communications 181:9, 1549-1556, Sep. 2010. DOI Link. arXiv link)
Free GPU Computing Workshop in Adelaide, South Australia
July 29th, 2010eResearch SA, XENON Systems and NVIDIA invite you to attend a free workshop on GPU computing with CUDA. The workshop will be held at 1:00PM on Tuesday 10 August 2010 at Mawson Lakes, in the Mawson Centre Lecture Theatre MC1-02.
Register now by visiting: http://nvidia.eventbrite.com
A complete modular resultant algorithm targeted for realization on graphics hardware
July 29th, 2010Abstract:
This paper presents a complete modular approach to computing bivariate polynomial resultants on Graphics Processing Units (GPU). Given two polynomials, the algorithm first maps them to a prime field for sufficiently many primes, and then processes each modular image individually. We evaluate each polynomial at several points and compute a set of univariate resultants for each prime in parallel on the GPU. The remaining “combine” stage of the algorithm comprising polynomial interpolation and Chinese remaindering is also executed on the graphics processor. The GPU algorithm returns coefficients of the resultant as a set of Mixed Radix (MR) digits. Finally, the large integer coefficients are recovered from the MR representation on the host machine. With the approach of displacement structure and efficient modular arithmetic we have been able to achieve more than 100x speed-up over a CPU-based resultant algorithm from Maple 13.
(Pavel Emeliyanenko: “A complete modular resultant algorithm targeted for realization on graphics hardware”, Proceedings of the 4th International Workshop on Parallel and Symbolic Computation (PASCO2010), pages 35-43, Grenoble, France, July 2010. DOI link. Direct PDF link.)
Swarm-NG: integration of an ensemble of N-body systems
July 29th, 2010The Swarm-NG package helps scientists and engineers harness the power of GPUs. In the early releases, Swarm-NG will focus on the integration of an ensemble of N-body systems evolving under Newtonian gravity. Swarm-NG does not replicate existing libraries that calculate forces for large-N systems on GPUs, but rather focuses on integrating an ensemble of many systems where N is small. This is of particular interest for astronomers who study the chaotic evolution of planetary systems. In the long term, we hope Swarm-NG will allow for the efficient parallel integration of user-defined systems of ordinary differential equations.
QYMSYM: A GPU-Accelerated Hybrid Symplectic Integrator That Permits Close Encounters
July 29th, 2010Abstract:
We describe a parallel hybrid symplectic integrator for planetary system integration that runs on a graphics processing unit (GPU). The integrator identifies close approaches between particles and switches from symplectic to Hermite algorithms for particles that require higher resolution integrations. The integrator is approximately as accurate as other hybrid symplectic integrators but is GPU accelerated.
(Alexander Moore and Alice C. Quillen: “QYMSYM: A GPU-Accelerated Hybrid Symplectic Integrator That Permits Close Encounters”. preprint on arXiv, available code)
SMVM on GPU
July 29th, 2010From the paper’s abstract:
A wide class of finite element electromagnetic applications requires computing very large sparse matrix vector multiplications (SMVM). Due to the sparsity pattern and size of the matrices, solvers can run relatively slowly. The rapid evolution of graphic processing units (GPUs) in performance, architecture and programmability make them very attractive platforms for accelerating computationally intensive kernels such as SMVM. This work presents a new algorithm to accelerate the performance of the SMVM kernel on graphic processing units.
From the paper’s conclusion:
We have introduced several efficient techniques to accelerate the execution of the sparse matrix vector multiplication (SMVM) on NVIDIA graphic processing units. The proposed methods increased the performance of the SMVM kernel on GT 8800 up to 18.8 times compared to the quad core CPU and 3 times compared to previous work by Bell and Garland on accelerating SMVM for GPUs.
(M. Mehri Dehnavi, D. Fernandez and D. Giannacopoulos: “Finite element sparse matrix vector multiplication on GPUs”. IEEE Transactions on Magnetics, vol. 46, no. 8, pp. 2982-2985, August 2010. DOI 10.1109/TMAG.2010.2043511)