Two leading computing visionaries will speak at the GPU Technology Conference (GTC) in September. Prof. Klaus Schulten, renowned computational biologist from the University of Illinois at Urbana-Champaign, will deliver a keynote highlighting discoveries made using the ‘computational microscope.’ Prof. Sebastian Thrun, robotics pioneer at Stanford University and distinguished engineer at Google, will speak on advances in GPU computing in computer vision and robotics. Registration is still open at www.nvidia.com/gtc.
The Back 40 Computing project aims at providing a collection of high performance GPU computing building blocks. It is maintained by Duane Merrill from the University of Virginia. Highlights of the current release include the fastest Radix Sort implementation on GPUs to date, capable of sorting over 1 billion keys per second. For more details you can also see this (pre-Fermi) Techreport (direct PDF link).
Source code and documentation are available on Google Code.
Version 2.2 of the ATI Stream SDK has been released. Features include:
- Support for OpenCL™ 1.1 specification.
- Support for Ubuntu® 10.04 and Red Hat® Enterprise Linux® 5.5.
- Support for X86 CPUs with SSE2.x or later (Adds to existing support for X86 CPUs with SSE3.x or later).
- Support for Microsoft® Visual Studio® 2010 Professional Edition and Minimalist GNU for Windows (MinGW) [GCC 4.4].
- Support for GNU Compiler Collection (GCC) 4.1 or later on Linux® systems (Adds to existing support for GCC 4.3 or later).
- Support for single-channel OpenCL™ image format.
- Support for OpenCL™ / DirectX® 10 interoperability.
- Support for additional double-precision floating point routines in OpenCL™ C kernels.
- Support for generating and loading binary OpenCL™ kernels.
- Support for native OpenCL™ kernels.
- Preview Feature: Support for accessing additional physical memory on the GPU from OpenCL™ applications.
- Preview Feature: Support for printf() in OpenCL™ C kernels.
- Extension: Support for additional event states when registering event callbacks in OpenCL™ 1.1.
- Additional OpenCL™ samples.
- Package Update: ATI Stream Profiler 1.4.
- Various OpenCL™ compiler and runtime fixes and enhancements.
- Expanded OpenCL™ performance optimization guidelines in the ATI Stream SDK OpenCL™ Programming Guide.
The SDK and all documentation can be downloaded from http://developer.amd.com/stream.
In this workshop hosted by iVEC and the University of Western Australia on August 19th, you will learn about CUDA, the Fermi architecture, and Tesla GPU Computing products. You will learn about the basics of programming GPUs using CUDA C and C++, the variety of available computational libraries for CUDA, tools for profiling and debugging CUDA applications, and approaches for optimizing CUDA parallel applications. You will also learn about CUDA-enabled desktop, workstation, and cluster computing solutions provided by Xenon Systems. The workshop will also include presentations on some of the ways these technologies are being used by researchers in Western Australia. Full details including speakers and agenda here (PDF).
TunaCode has announced the release of CUVI Lib v0.3 (Beta version) for Windows 32 and 64 Systems. A copy can be downloaded from http://www.cuvilib.com/downloads.
CUVI Lib (CUDA for Vision and Imaging Lib) is an add-on library for NPP (NVIDIA Performance Primitives) and includes several advanced computer vision and image processing functions presently not available in NPP. This version of CUVI Lib supports, among others:
- Optical Flow (Horn & Shunck)
- Optical Flow (Lucas & Kanade)
- Discrete Wavelet Transform (Forward and Inverse)
- Hough Transform
- Hough Lines (Lines Detector)
- Color Conversion (RGB-to-gray and RGBA-to-Gray)
Several more advanced features will be added to CUVI Lib in upcoming releases. A detailed function reference can be downloaded here. Forums to discuss feedback and further ideas are available.
A Modern Graphics Processing unit (GPU) is able to perform massively parallel scientific computations at low cost. We extend our implementation of the checkerboard algorithm for the two-dimensional Ising model in order to overcome the memory limitations of a single GPU which enables us to simulate significantly larger systems. Using multi-spin coding techniques, we are able to accelerate simulations on a single GPU by factors up to 35 compared to an optimized single Central Processor Unit (CPU) core implementation which employs multi-spin coding. By combining the Compute Unified Device Architecture (CUDA) with the Message Parsing Interface (MPI) on the CPU level, a single Ising lattice can be updated by a cluster of GPUs in parallel. For large systems, the computation time scales nearly linearly with the number of GPUs used. As proof of concept we reproduce the critical temperature of the 2D Ising model using finite size scaling techniques.
(Benjamin Block, Peter Virnau and Tobias Preis: “Multi-GPU accelerated multi-spin Monte Carlo simulations of the 2D Ising model”, Computer Physics Communications 181:9, 1549-1556, Sep. 2010. DOI Link. arXiv link)
eResearch SA, XENON Systems and NVIDIA invite you to attend a free workshop on GPU computing with CUDA. The workshop will be held at 1:00PM on Tuesday 10 August 2010 at Mawson Lakes, in the Mawson Centre Lecture Theatre MC1-02.
Register now by visiting: http://nvidia.eventbrite.com
This paper presents a complete modular approach to computing bivariate polynomial resultants on Graphics Processing Units (GPU). Given two polynomials, the algorithm first maps them to a prime field for sufficiently many primes, and then processes each modular image individually. We evaluate each polynomial at several points and compute a set of univariate resultants for each prime in parallel on the GPU. The remaining “combine” stage of the algorithm comprising polynomial interpolation and Chinese remaindering is also executed on the graphics processor. The GPU algorithm returns coefficients of the resultant as a set of Mixed Radix (MR) digits. Finally, the large integer coefficients are recovered from the MR representation on the host machine. With the approach of displacement structure and efficient modular arithmetic we have been able to achieve more than 100x speed-up over a CPU-based resultant algorithm from Maple 13.
(Pavel Emeliyanenko: “A complete modular resultant algorithm targeted for realization on graphics hardware”, Proceedings of the 4th International Workshop on Parallel and Symbolic Computation (PASCO2010), pages 35-43, Grenoble, France, July 2010. DOI link. Direct PDF link.)
The Swarm-NG package helps scientists and engineers harness the power of GPUs. In the early releases, Swarm-NG will focus on the integration of an ensemble of N-body systems evolving under Newtonian gravity. Swarm-NG does not replicate existing libraries that calculate forces for large-N systems on GPUs, but rather focuses on integrating an ensemble of many systems where N is small. This is of particular interest for astronomers who study the chaotic evolution of planetary systems. In the long term, we hope Swarm-NG will allow for the efficient parallel integration of user-defined systems of ordinary differential equations.
We describe a parallel hybrid symplectic integrator for planetary system integration that runs on a graphics processing unit (GPU). The integrator identifies close approaches between particles and switches from symplectic to Hermite algorithms for particles that require higher resolution integrations. The integrator is approximately as accurate as other hybrid symplectic integrators but is GPU accelerated.