October 26th, 2012
October 26th, 2012
Jacket enables GPU computing for MATLAB® codes. The new version v2.3 includes performance improvements and new support for CUDA 5.0. This newer version of CUDA enables computation on the latest Kepler K20 GPUs of the NVIDIA Tesla product line.
More information: http://blog.accelereyes.com/blog/2012/10/23/jacket-v2-3/
October 24th, 2012
GPUs have become a corner stone of computational research in high performance computing with over 200 commonly used applications already GPU-enabled. Researchers across many domains, such as Computational Chemistry, Biology, Weather & Climate, and Engineering, are using GPU-accelerated applications to greatly reduce time to discovery by achieving results that were simply not possible before.
Join Devang Sachdev, Sr. Product Manager, NVIDIA for an overview of the most popular applications used in academic research and an account of success stories enabled by GPUs. Learn also about a complimentary program which allows researchers to easily try GPU-accelerated applications on a remotely hosted cluster or Amazon AWS cloud.
Register at http://www.gputechconf.com/page/gtc-express-webinar.html.
October 22nd, 2012
This article proposes to address, in a tutorial style, the benefits of using Open Computing Language (OpenCL) as a quick way to allow programmers to express and exploit parallelism in signal processing algorithms, such as those used in error-correcting code systems. In particular, we will show how multiplatform kernels can be developed straightforwardly using OpenCL to perform computationally intensive low-density parity-check (LDPC) decoding, targeting them to run on a large set of worldwide disseminated multicore architectures, such as x86 general- purpose multicore central processing units (CPUs) and graphics processing units (GPUs). Moreover, devices with different architectures can be orchestrated to cooperatively execute these signal processing applications programmed in OpenCL. Experimental evaluation of the parallel kernels programmed with the OpenCL framework shows that high-performance can be achieved for distinct parallel computing architectures with low programming effort.
The complete source code developed and instructions for compiling and executing the program are available at http://www.co.it.pt/ldpcopencl for signal processing programmers who wish to engage with more advanced features supported by OpenCL.
(G. Falcao, V. Silva, L. Sousa and J. Andrade: “Portable LDPC Decoding on Multicores Using OpenCL [Applications Corner]“, IEEE Signal Processing Magazine 29:4(81-109), July 2012. [DOI])
October 15th, 2012
Accelerating numerical algorithms for solving sparse linear systems on parallel architectures has attracted the attention of many researchers due to their applicability to many engineering and scientific problems. The solution of sparse systems often dominates the overall execution time of such problems and is mainly solved by iterative methods. Preconditioners are used to accelerate the convergence rate of these solvers and reduce the total execution time. Sparse Approximate Inverse (SAI) preconditioners are a popular class of preconditioners designed to improve the condition number of large sparse matrices and accelerate the convergence rate of iterative solvers for sparse linear systems. We propose a GPU accelerated SAI preconditioning technique called GSAI, which parallelizes the computation of this preconditioner on NVIDIA graphic cards. The preconditioner is then used to enhance the convergence rate of the BiConjugate Gradient Stabilized (BiCGStab) iterative solver on the GPU. The SAI preconditioner is generated on average 28 and 23 times faster on the NVIDIA GTX480 and TESLA M2070 graphic cards respectively compared to ParaSails (a popular implementation of SAI preconditioners on CPU) single processor/core results. The proposed GSAI technique computes the SAI preconditioner in approximately the same time as ParaSails generates the same preconditioner on 16 AMD Opteron 252 processors.
(Maryam Mehri Dehnavi, David Fernandez, Jean-Luc Gaudiot and Dennis Giannacopoulos: “Parallel Sparse Approximate Inverse Preconditioning on Graphic Processing Units”, IEEE Transactions on Parallel and Distributed Systems (to appear). [DOI])
October 11th, 2012
The CUDA 5 Production Release is now available as a free download at www.nvidia.com/getcuda.
This powerful new version of the pervasive CUDA parallel computing platform and programming model can be used to accelerate more of applications using the following four (and many more) new features.
• CUDA Dynamic Parallelism brings GPU acceleration to new algorithms by enabling GPU threads to directly launch CUDA kernels and call GPU libraries.
• A new device code linker enables developers to link external GPU code and build libraries of GPU functions.
• NVIDIA Nsight Eclipse Edition enables you to develop, debug and optimize CUDA code all in one IDE for Linux and Mac OS.
• GPUDirect Support for RDMA provides direct communication between GPUs in different cluster nodes
As a demonstration of the power of Dynamic Parallelism and device code linking, CUDA 5 includes a device-callable version of the CUBLAS linear algebra library, so threads already running on the GPU can invoke CUBLAS functions on the GPU. Read the rest of this entry »
October 11th, 2012
Seeing speedups of an accelerated application is great, but what does it take to build a codebase that will last for years and across architectures? In this webinar, John Stratton will cover some of the insights gained at the University of Illinois at Urbana-Champaign from experience with computer architecture, programming languages, and application development.
The webinar will offer three main conclusions including:
- Performance portability should be more achievable than many people think.
- The number one performance-limiting factor now and in the future will be parallel scalability.
- As much as we care about performance, general libraries that will last have to be reliable as well as fast.
Register at http://www.gputechconf.com/page/gtc-express-webinar.html
October 9th, 2012
The MicroCFD Virtual Wind Tunnel, Educational & Professional Edition, has recently been upgraded. The new version (1.8) supports multi-core CPUs and CUDA enabled GPUs and runs
significantly faster than the previous single-processor version. The results of a benchmark test on a system with an Intel quad-core CPU and an NVIDIA 96-core GPU show that an unsteady 2D or axis-symmetric compressible flow can now be run at a resolution of one million cells (Pro Edition) within a few minutes. A 3D version is currently under development and is expected to be released in 2014.
September 25th, 2012
AMD CodeXL is a new unified developer tool suite that enables developers to harness the benefits of CPUs, GPUs and APUs. It includes powerful GPU debugging, comprehensive GPU and CPU profiling, and static OpenCL™ kernel analysis capabilities, enhancing accessibility for software developers to enter the era of heterogeneous computing. AMD CodeXL is available for free, both as a Visual Studio® extension and a standalone user interface application for Windows® and Linux®.
AMD CodeXL increases developer productivity by helping them identify programming errors and performance issues in their application quickly and easily. Now developers can debug, profile and analyze their applications with a full system-wide view on AMD APU, GPU and CPUs.
AMD CodeXL user group (requires registration) allows users to interact with the CodeXL team, provide feedback, get support and participate in the beta surveys.
September 23rd, 2012
PCO13 is to be held in conjunction with IEEE IPDPS, Boston, USA, May 20-24, 2013. Paper Submission Deadline: December 21, 2012.
The workshop on Parallel Computing and Optimization aims at providing a forum for scientific researchers and engineers on recent advances in the field of parallel or distributed computing for difficult combinatorial optimization problems, like 0-1 multidimensional knapsack problems and cutting stock problems, large scale linear programming problems, nonlinear optimization problems and global optimization problems. Emphasis will be placed on new techniques for the solution of these difficult problems like cooperative methods for integer programming problems and polynomial optimization methods. Aspects related to Combinatorial Scientific Computing (CSC) will also be treated. Finally, the use of new approaches in parallel computing like GPU or hybrid computing, peer to peer computing and cloud computing will be considered. Application to planning, logistics, manufacturing, inance, telecommunications and computational biology will be considered.
Read the rest of this entry »
The OpenFOAM CFD Toolbox is a free, open source CFD software package produced by OpenCFD Ltd. Its user base represents a wide range of engineering and science disciplines in both commercial and academic organizations. OpenFOAM has an extensive range of features to solve a wide range of fluid flows and physics phenomenon. OpenFOAM provides tools for all three stages of CFD, preprocessing, solvers, and post processing. Almost all are capable of being run in parallel as standard making it an important resource for a wide range of scientists and engineers using HPC for CFD.
General-purpose Graphics Processing Unit (GPU) technology is increasingly being used to accelerate compute-intensive HPC applications across various disciplines in the HPC community. OpenFOAM CFD simulations can take a significant amount of time and are computationally intensive. Comparing various alternatives for enabling faster research and discovery using CFD is of key importance. SpeedIT libraries from Vratis provide GPU-accelerated iterative solvers that replace the iterative solvers in OpenFOAM.
In order to investigate the GPU-acceleration of OpenFOAM, we simulate the three dimensional lid-driven cavity problem based on the tutorial provided with OpenFOAM. The 3D lid-driven cavity problem is an incompressible flow problem solved using OpenFOAM icoFoam solver. The majority of the computationally intensive portion of the solver is the pressure equation. In the case of acceleration, only the pressure calculation is offloaded to the GPUs. On the CPUs, the PCG solver with DIC preconditioner is used. In the GPU-accelerated case, the SpeedIT 2.1 algebraic multigrid precoditioner with smoothed aggregation (AMG) in combination with the SpeedIT Plugin to OpenFOAM is used.