The SpeedIt team recently compared and benchmarked the SpMV performance of CUSPARSE 4.0, CUSP 0.2.0 and SpeedIT 2.0 on 23 randomly chosen matrices from University Florida Matrix Collection. Comparisons were done on a Tesla C2050 in single and double precision. The full report is available at http://wp.me/p1ZihD-1.
OCLTools is a powerful, yet compact, suite of Open Source tools that provide OpenCL developers with more alternatives to kernel compilation. OCLTools enables developers to eliminate costly kernel compilation time from the runtime of your application. With OCLTools developers can embed the source code of their kernels (clear text or encrypted) directly into their program binaries, eliminating the need to distribute kernel source code in the open while still maintaining the flexibility of runtime compilation. Both source code and precompiled binaries can be embedded into OpenCL binaries, effectively eliminating the additional kernel compilation overhead from the run time of your application.
For more information go to http://www.clusterchimps.org
AMD just released to open source a project called Aparapi that started in their JavaLabs team. Aparapi is an API for expressing data parallel workloads in Java and a runtime component capable of converting the Java bytecode of compatible workloads into OpenCL™ so that it can be executed on a variety of GPU devices. More information can be found in this blog entry.
This chapter demonstrates how to leverage the Thrust parallel template library to implement high-performance applications with minimal programming effort. Based on the C++ Standard Template Library (STL), Thrust brings a familiar high-level interface to the realm of GPU Computing while remaining fully interoperable with the rest of the CUDA software ecosystem. Applications written with Thrust are concise, readable, and efficient.
(Nathan Bell and Jared Hoberock: “Thrust: A Productivity-Oriented Library for CUDA”, GPU Computing Gems, Jade Edition, edited by Wen-mei W. Hwu, October 2011)
TidePowerd has released Version 2 of their GPU computing solution for the .NET framework, GPU.NET. Their platform allows developers to quickly and easily write GPU-accelerated applications completely in .NET-based languages. Some key benefits include:
- Stay in C# and treat kernel methods like any regular method
- “Boilerplate” GPU programming tasks such as memory transfer and GPU scheduling are abstracted from the developer
- Cross-platform and cross-hardware with a single binary
- Systems seamlessly adapt to new hardware without rewriting code
- Speed on par with native code
New version 2 features:
- Visual Studio Error list and IntelliSense integration
- On-device random number generation
- Double precision support
Heterogeneous computing is moving into the mainstream, and a broader range of applications are already on the way. As the provider of world-class CPUs, GPUs, and APUs, AMD offers unique insight into these technologies and how they interoperate. We’ve been working with industry and academia partners to help advance real-world use of these technologies, and to understand the opportunities that lie ahead. It’s time to share what we’ve learned so far.
With tutorials, hands-on labs, and sessions that span a range of topics from HPC to multimedia, you’ll have the opportunity to expand your view of what heterogeneous computing currently offers and where it is going. You’ll hear from industry innovators and academic pioneers who are exploring different ways of approaching problems, and utilizing new paradigms in computing to help identify solutions. You’ll meet AMD experts with deep knowledge of hardware architectures and the software techniques that best leverage those platforms. And you’ll connect with other software professionals who share your passion for the future of technology.
Learn more at developer.amd.com/afds.
Today NVIDIA announced the upcoming 4.0 release of CUDA. While most of the major CUDA releases accompanied a new GPU architecture, 4.0 is a software-only release, but that doesn’t mean there aren’t a lot of new features. With this release, NVIDIA is aiming to lower the barrier to entry to parallel programming on GPUs, with new features including easier multi-GPU programming, a unified virtual memory address space, the powerful Thrust C++ template library, and automatic performance analysis in the Visual Profiler tool. Full details follow in the quoted press release below.
A simple tool for off-line compilation of OpenCL kernel code, called “OpenCLcc”, is now available at
OpenCLcc takes a text file with the OpenCL kernel code as input and calls the OpenCL run-time to compile it, echoing errors to the console.
SpeedIT Extreme 1.2 introduces support for complex numbers in single and double precision for all SpeedIT methods, such as fast sparse matrix vector multiplication, CG and BiCGSTAB solver.
The OpenFOAM SpeedIT plugin version 1.1 has been released under the GPL License. The most important new features are:
- Multi-GPU support
- Tested on Fermi architecture (GTX460 and Tesla C2050)
- Automated submission of the domain to the GPU cards (using decomposePar from OpenFOAM)
- Optimized submission of computational tasks to the best GPU card in the system for any number of computational threads
- Plugin picks the most powerful GPU card for a single thread cases
The OpenFOAM SpeedIT plugin is available at http://speedit.vratis.com.