From a recent press release:
Amdahl Software, a leading supplier of development tools for multi-core software, after extensive beta testing by evaluators over a dozen countries and numerous end-user application markets, today announced the production release of OpenCL CodeBench. OpenCL CodeBench is an OpenCL Code Creation tool. It simplifies parallel software development, enabling developers to rapidly generate and optimize OpenCL applications. Engineering productivity is increased through the automation of overhead tasks. The tools suite enables engineers to work at higher levels of abstraction, accelerating the code development process. OpenCL CodeBench benefits both expert and novice engineers through a choice of command line or guided, wizard-driven development methodologies. Close cooperation with IP, SOC and platform vendors will enable future releases of OpenCL CodeBench to more tightly optimize software for specific end user platforms and development environments.
OpenCL CodeBench is available for trial or purchase. For additional information, please visit www.amdahlsoftware.com.
AccelerEyes has released dates for their upcoming CUDA and OpenCL training courses.
More information can be found on the courses’ webpages.
Acceleware has recently announced four courses on parallel programming:
- OpenCL on AMD APU CPUs: Jan 29 to Feb 1, 2013, Chicago, IL and Apr 9 to Apr 12, 2013, Los Angeles, CAL
- 4 Day CUDA Course with an Oil and Gas focus: Mar 12 to Mar 15, 2013, Houston, TX
- 4 Day C++ AMP Training: Apr 23 to Apr 26, 2013, Seattle, WA
More information is available on the courses’ webpages.
The MicroCFD Virtual Wind Tunnel, Educational & Professional Edition, has recently been upgraded. The new version (1.8) supports multi-core CPUs and CUDA enabled GPUs and runs
significantly faster than the previous single-processor version. The results of a benchmark test on a system with an Intel quad-core CPU and an NVIDIA 96-core GPU show that an unsteady 2D or axis-symmetric compressible flow can now be run at a resolution of one million cells (Pro Edition) within a few minutes. A 3D version is currently under development and is expected to be released in 2014.
The OpenFOAM CFD Toolbox is a free, open source CFD software package produced by OpenCFD Ltd. Its user base represents a wide range of engineering and science disciplines in both commercial and academic organizations. OpenFOAM has an extensive range of features to solve a wide range of fluid flows and physics phenomenon. OpenFOAM provides tools for all three stages of CFD, preprocessing, solvers, and post processing. Almost all are capable of being run in parallel as standard making it an important resource for a wide range of scientists and engineers using HPC for CFD.
General-purpose Graphics Processing Unit (GPU) technology is increasingly being used to accelerate compute-intensive HPC applications across various disciplines in the HPC community. OpenFOAM CFD simulations can take a significant amount of time and are computationally intensive. Comparing various alternatives for enabling faster research and discovery using CFD is of key importance. SpeedIT libraries from Vratis provide GPU-accelerated iterative solvers that replace the iterative solvers in OpenFOAM.
In order to investigate the GPU-acceleration of OpenFOAM, we simulate the three dimensional lid-driven cavity problem based on the tutorial provided with OpenFOAM. The 3D lid-driven cavity problem is an incompressible flow problem solved using OpenFOAM icoFoam solver. The majority of the computationally intensive portion of the solver is the pressure equation. In the case of acceleration, only the pressure calculation is offloaded to the GPUs. On the CPUs, the PCG solver with DIC preconditioner is used. In the GPU-accelerated case, the SpeedIT 2.1 algebraic multigrid precoditioner with smoothed aggregation (AMG) in combination with the SpeedIT Plugin to OpenFOAM is used.
The fall schedule for Acceleware’s training courses is now available.
- OpenCL: August 21-24, 2012, Houston, TX
- CUDA: October 2-5, 2012, San Jose, CA
- OpenCL: October 16-19, 2012, Calgary, AB
- CUDA: November 6-9, 2012, Houston, TX
- CUDA: December 4-7, 2012, New York, NY – Finance Focus
- AMP: December 11-14, 2012, Chicago, IL
More information: http://www.acceleware.com/training
PGI Release 12.6 is now out. New in this release:
- PGI Accelerator compilers — first release of the Fortran and C compilers to include comprehensive support for the OpenACC 1.0 specification including the acc cache construct and the entire OpenACC API library. See the PGI Accelerator page for a complete list of supported features.
- CUDA Toolkit — PGI Accelerator compilers and CUDA Fortran now include support for CUDA Toolkit version 4.2; version 4.1 is now the default.
Download a free trial from the PGI website at http://www.pgroup.com/support/download_pgi2012.php?view=current. Upcoming PGI webinar with Michael Wolfe. 9:00AM PDT, July 31st sponsored by NVIDIA: “Using OpenACC Directives with the PGI Accelerator Compilers”. Register at http://www.pgroup.com/webinar212.htm?clicksource=gpgpu712.
C++ Accelerated Massive Parallelism (C++ AMP) is a new open specification heterogeneous programming model, which builds on the established C++ language. Developed for heterogeneous platforms C++ AMP is designed to accelerate the execution of C++ code by taking advantage of the data-parallel hardware that is commonly present as a GPU. These courses are aimed at programmers who are looking to develop comprehensive skills in writing and optimizing applications using C++ AMP. Read the rest of this entry »
SpeedIT provides a set of accelerated solvers for sparse linear systems of equations. The library supports C/C++ and Fortran, and it can be used with OpenFOAM to accelerate CFD simulations. SpeedIT 2.1 contains two new preconditioners:
• Algebraic Multigrid with Smoothed Aggregation (AMG)
• Approximate Inverse (AINV)
OpenFOAM simulations on the GPU can be up to 3.5x faster compared to CG and DIC/DILU preconditioners on the CPU and up to 1.6x faster if you run GAMG.
See the SpeedIT website and blog for more details.
NVIDIA Kepler GK110 Die Shot
This white paper describes the new Kepler GK110 Architecture from NVIDIA.
Comprising 7.1 billion transistors, Kepler GK110 is not only the fastest, but also the most architecturally complex microprocessor ever built. Adding many new innovative features focused on compute performance, GK110 was designed to be a parallel processing powerhouse for Tesla® and the HPC market.
Kepler GK110 will provide over 1 TFlop of double precision throughput with greater than 80% DGEMM efficiency versus 60‐65% on the prior Fermi architecture.
In addition to greatly improved performance, the Kepler architecture offers a huge leap forward in power efficiency, delivering up to 3x the performance per watt of Fermi.
The paper describes features of the Kepler GK110 architecture, including
- Dynamic Parallelism;
- Grid Management Unit;
- NVIDIA GPUDirect™;
- New SHFL instruction and atomic instruction enhancements;
- New read-only data cache previously only accessible to texture;
- Bindless Textures;
- and much more.