December 14th, 2011
December 7th, 2011
In this paper we investigate the use of distributed graphics processing unit (GPU)-based architectures to accelerate pipelined wavefront applications—a ubiquitous class of parallel algorithms used for the solution of a number of scientific and engineering applications. Specifically, we employ a recently developed port of the LU solver (from the NAS Parallel Benchmark suite) to investigate the performance of these algorithms on high-performance computing solutions from NVIDIA (Tesla C1060 and C2050) as well as on traditional clusters (AMD/InfiniBand and IBM BlueGene/P).
Benchmark results are presented for problem classes A to C and a recently developed performance model is used to provide projections for problem classes D and E, the latter of which represents a billion-cell problem. Our results demonstrate that while the theoretical performance of GPU solutions will far exceed those of many traditional technologies, the sustained application performance is currently comparable for scientific wavefront applications. Finally, a breakdown of the GPU solution is conducted, exposing PCIe overheads and decomposition constraints. A new k-blocking strategy is proposed to improve the future performance of this class of algorithm on GPU-based architectures.
(Pennycook, S.J., Hammond, S.D., Mudalige, G.R., Wright, S.A. and Jarvis, S.A.: “On the Acceleration of Wavefront Applications using Distributed Many-Core Architectures”, The Computer Journal (in press) [DOI] [PREPRINT])
December 7th, 2011
A major new release of the Intel SPMD Program Compiler (ispc) was posted on December 5, 2011. ispc is an extended version of the C programming language with support for “single program, multiple data” (SPMD) programming on the CPU; the SPMD model makes it easy to harness the full power of both the SIMD vector units and multiple cores on modern CPUs. The major features added in the 1.1 release include:
- Full support for pointers, including pointer arithmetic, function pointers, and all other features of pointers in C.
- A new parallel “foreach” statement, for more easily mapping computation to data.
- Substantially revised documentation, including a new Performance Guide.
- Many other small bug fixes and improvements.
ispc is open-source and is licensed under the BSD license. Source and binaries are available from http://ispc.github.com.
December 6th, 2011
Since the last WCCM (Sydney 2009), where we organized a similarly themed minisymposium, the scientific and engineering communities have gained much experience in using GPU hardware for their applications. The number of publications addressing GPU applications has skyrocketed, while researchers have developed much common understanding of how to implement numerical methods in this architecture. Moreover, we now find that three of the five fastest computers in the world, as measured for the Top500 list, are GPU-based systems. There is much conversation about GPUs playing a leading role in the exascale computing world. In summary, this topic is of wide interest; frankly, it is all the rage. This minisymposium will concentrate presentations from the top researchers in the world using GPU hardware for applications in all branches of computational mechanics. We encourage contributions that address innovative methods to use GPUs efficiently, studies in numerical methods as they apply to adapting to the hardware and perspectives on the future of GPUs as we advance toward exascale.
WCCM will be held at São Paolo, Brazil, 8–13 July 2012. The abstract submission deadline is December 31, 2011. More information: http://www.wccm2012.com, http://barbagroup.bu.edu/Barba_group/Events.html.
November 30th, 2011
The NVIDIA CUDA Toolkit 4.1 RC2 is now available for anyone to download. The key features of this release are:
- A new LLVM based compiler
- Over 1000 additional image processing function in the NPP library
- A Visual profiler
There is also a new version of Parallel Nsight 2.1 RC2 with support for CUDA 4.1. To download and to find out more follow: http://bit.ly/sRpQvr
November 29th, 2011
Libra SDK is a sophisticated runtime including API, sample programs and documentation for massively accelerating software computations. This introduction tutorial provides an overview and usage examples of the powerful Libra API & math libraries executing on x86/x64, OpenCL, OpenGL and CUDA technology. Libra API enables generic and portable CPU/GPU computing within software development without the need to create multiple, specific and optimized code paths to support x86, OpenCL, OpenGL or CUDA devices. Link to PDF: www.gpusystems.com/doc/LibraGenericComputing.pdf
November 28th, 2011
KOAP, pronounced “cope,” is a tool for developing OpenCL applications. It’s purpose is to allow the programmer to aggregate and simplify calls to the OpenCL API. KOAP accepts as input a file containing (or including) both the OpenCL program and the host C program. KOAP understands several directives, each of which is prefixed with a $ character. When KOAP is run, these directives are replaced with the requisite OpenCL API calls. Programs preprocessed by KOAP can run on any target supported by OpenCL, including both NVIDIA and AMD GPUs.
KOAP is now freely available as a source code tar file from http://aggregate.org/KOAP/.
November 20th, 2011
Support for several types of compression has been added to the GPU-based database engine ålenkå . Supported algorithms include FOR (frame of reference), FOR-DELTA and dictionary compression. All compression algorithms run on the GPU achieving gigabytes per second compression and decompression speed. The use of compression allows to significantly reduce or eliminate I/O bottlenecks in analytical queries as shown by ålenkå’s results in the Star Schema and TPC-H benchmarks.
November 18th, 2011
The 4th Workshop on using Emerging Parallel Architectures (WEPA 2012) is held in conjunction with the International Conference on Computational Science (ICCS 2012), Omaha, Nebraska, June 2-4, 2011.
The computing landscape has undergone significant transformation with the emergence of more powerful processing elements such as GPUs, FPGAs, multi-cores, etc. On the multi-core front, Moore’s Law has transcended beyond the single processor boundary with the prediction that the number of cores will double every 18 months. Going forward, the primary method of gaining processor performance will be through parallelism. Multi-core technology has visibly penetrated the global market. Accordingly to the latest Top500 lists the HPC landscape has evolved from supercomputer systems into large clusters of dual or quad-core processors. Furthermore, GPUs, FPGAs and multi-cores have been shown to be formidable computing alternatives, where certain classes of applications witness more than one order of magnitude improvement over their GPP counterpart. Therefore, future computational science centers will employ resources such as FPGA and GPU architectures to serve as co-processors to offload appropriate compute-intensive portions of applications from the servers. Read the rest of this entry »
November 17th, 2011
From a recent press release:
Taipei, November 18, 2011: Zillians, a leading cloud solution provider specializing in high performance computing, GPU virtualization middleware and massive multi-player online game (MMOG) platforms today announced the availability of vGPU – the world’s first commercial virtualization solution for decoupling GPU hardware from software. Traditionally, physical GPUs must reside on the same machine running GPU code. This severely hampers GPU cloud deployment due to the difficulty of dynamic GPU provisioning. With vGPU technology, bulky hardware is no longer a limiting factor. vGPU introduces a thin, transparent RPC layer between local application and remote GPU, enabling existing GPU software to run without any modification on a remote GPU resource. Read the rest of this entry »
ClusterChimps.org has released a step by step guide to integrating CUDA with GNU Autotools. The guide covers building stand alone CUDA binaries, static CUDA libraries, shared CUDA libraries and comes with an example tarball. For more information go to http://www.clusterchimps.org/autotools.php