May 24th, 2012
May 22nd, 2012
SpeedIT provides a set of accelerated solvers for sparse linear systems of equations. The library supports C/C++ and Fortran, and it can be used with OpenFOAM to accelerate CFD simulations. SpeedIT 2.1 contains two new preconditioners:
• Algebraic Multigrid with Smoothed Aggregation (AMG)
• Approximate Inverse (AINV)
OpenFOAM simulations on the GPU can be up to 3.5x faster compared to CG and DIC/DILU preconditioners on the CPU and up to 1.6x faster if you run GAMG.
See the SpeedIT website and blog for more details.
May 20th, 2012
Traditional CPU-based computing environments offer a variety of binary instrumentation frameworks. Instrumentation and analysis tools for GPU environments to date have been more limited. Panoptes is a binary instrumentation framework for CUDA that targets the GPU. By exploiting the GPU to run modified kernels, computationally-intensive programs can be run at the native parallelism of the device during analysis. To demonstrate its instrumentation capabilities, we currently implement a memory addressability and validity checker that targets CUDA programs.
Panoptes traces targeted programs by library interposition at runtime. Read the rest of this entry »
May 17th, 2012
NVIDIA Kepler GK110 Die Shot
This white paper describes the new Kepler GK110 Architecture from NVIDIA.
Comprising 7.1 billion transistors, Kepler GK110 is not only the fastest, but also the most architecturally complex microprocessor ever built. Adding many new innovative features focused on compute performance, GK110 was designed to be a parallel processing powerhouse for Tesla® and the HPC market.
Kepler GK110 will provide over 1 TFlop of double precision throughput with greater than 80% DGEMM efficiency versus 60‐65% on the prior Fermi architecture.
In addition to greatly improved performance, the Kepler architecture offers a huge leap forward in power efficiency, delivering up to 3x the performance per watt of Fermi.
The paper describes features of the Kepler GK110 architecture, including
- Dynamic Parallelism;
- Grid Management Unit;
- NVIDIA GPUDirect™;
- New SHFL instruction and atomic instruction enhancements;
- New read-only data cache previously only accessible to texture;
- Bindless Textures;
- and much more.
April 27th, 2012
TunaCode has released CUVILib v1.2, a library to accelerate imaging and computer vision applications. CUVILib adds acceleration to Imaging applications from Medical, Industrial and Defense domains. It delivers very high performance and supports both CUDA and OpenCL. Modules include color operations (demosaic, conversions, correction etc), linear/non-linear filtering, feature extraction & tracking, motion estimation, image transforms and image statistics.
More information, including a free trial version: http://www.cuvilib.com/
April 18th, 2012
The Intel® SDK for OpenCL Applications now supports the OpenCL 1.1 full-profile on 3rd generation Intel® Core™ processors with Intel® HD Graphics 4000/2500. For the first time, OpenCL developers using Intel® architecture can utilize compute resources across both Intel® Processor and Intel HD Graphics. More information: http://software.intel.com/en-us/articles/vcsource-tools-opencl-sdk
April 1st, 2012
The rCUDA Team is proud to announce a new version of the rCUDA framework which will include many new functionalities as well as boosted performance. This new version, cooked for over a year, will incorporate pipelined transfers, full multi-thread and multi-node capabilities, CUDA 4.1 support, global scheduler integration, support for CUDA C extensions, and native InfiniBand support. A closed beta teting program has been started. See the complete text at http://www.rcuda.net/index.php/news/19-new-revolutionary-version-of-rcuda-to-be-launched.html.
March 30th, 2012
Accelerate your science on the Titan Supercomputer later this year, by harnessing up to 20 petaflops of parallel processing using GPUs. Open to researchers from academia, government labs, and industry, the Innovative and Novel Computational Impact on Theory and Experiment (INCITE) program is the major means by which the scientific community gains access to some of the fastest supercomputers.
First, let INCITE know you are interested in GPU acceleration by completing a two-minute survey. Then determine if you want to submit a formal proposal by June 27, 2012.
Need help drafting your proposal? Attend a “how-to” webinar on Tuesday, April 24 to learn tips and tricks for drafting your proposal. For further questions about the call for proposals, please contact the INCITE manager at INCITE@DOEleadershipcomputing.org.
March 6th, 2012
AMD offers an OpenCL Programming Webinar Series to help software developers become experts in the latest technologies, standards and best practices. The series of three OpenCL webinars will be presented by Rob Farber.
1. April 10th, 10AM PDT: Introducing Portable Parallelism
- C and C++ APIs
- OpenCL Memory Spaces
- The OpenCL Execution Model
2. April 24th, 10AM PDT: Coordinating OpenCL Computations on one more Heterogeneous Devices
- How to Concisley Utilize Multiple Command Queues and Coordinate Tasks Across Multiple Heterogeneous Devices such as two GPU + CPU
- Code Sample Discussion: Massively Parallel Random Number Test Framework
3. May 1st, 10AM PDT: Accelerate Rendering by an Order of Magnitude with OpenCL, Plus a View to the Multi-core and Web-enabled Future
- How to use OpenCL to Provide High-Quality, Fast Rendering in Combination with Primitive Restart
- Device Fission, Partitioning Hardware Capabilities for Optimal Resource Usage
- Looking to the Future – WebCL
Registration is limited. More Information: http://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspx
February 28th, 2012
PORTLAND, Ore., March 5 — The Portland Group, a wholly-owned subsidiary of STMicroelectronics, today announced availability of the 2012 release of the PGI line of high-performance parallelizing compilers and development tools for Linux, OS X and Windows. PGI 2012 is the first general release to include support for the OpenACC directive-based programming model for NVIDIA CUDA-enabled Graphics Processing Units (GPUs). This release is also the first to include the fully feature-enabled PGI CUDA C/C++ compiler for multi-core x64 CPUs from Intel and AMD. In addition, PGI 2012 includes a number of performance and feature enhancements for multi-core x64 processor-based HPC systems.
This Dr. Dobb’s Article by Rob Farber provides a tutorial on creating application plugins to accelerate Windows and Linux application performance using CUDA in dynamically loaded libraries.
Adding GPU capabilities to existing Windows and Linux apps can be done simply using plugins and the built-in support found in CUDA. This easy form of dynamic loading enables CUDA to be used selectively to hugely accelerate individual tasks within a larger application.
CUDA is maturing to become a natural extension of the emerging CPU/GPU paradigm of high-speed computing to make it, and GPU computing, a candidate for all application development. A recent article in this series tutorial series, Running CUDA Code Natively on x86 Processors, noted recent developments that allow CUDA programs to transparently compile and run on x86 processors. This article focuses on incorporating CUDA into Windows and Linux workflows by exploiting the capabilities of the NVIDIA compiler driver, nvcc, to create native runtime loadable plugins. Source code is provided to create and utilize CUDA plugins and even dynamically compile and link a CUDA source file into a running application (just like the OpenCL). Read the rest of this entry »