November 10th, 2012
November 8th, 2012
This paper addresses the design, implementation and validation of an effective scheduling scheme for both regular and irregular applications on heterogeneous platforms. The scheduler uses an empirical performance model to dynamically schedule the workload, organized into a given number of chunks, and follows the Heterogeneous Earliest Finish Time (HEFT) scheduling algorithm, which ranks the tasks based on both their computation and communication costs. The evaluation of the proposed approach is based on three case studies – the SAXPY, the FFT and the Barnes-Hut algorithms – two regular and one irregular application. The scheduler was evaluated on a heterogeneous platform with one quad-core CPU-chip accelerated by one or two GPU devices, embedded in the GAMA framework. The evaluation runs measured the effectiveness, the efficiency and the scalability of the proposed method. Results show that the proposed model was effective in addressing both regular and irregular applications, on heterogeneous platforms, while achieving ideal (>=100%) levels of efficiency in the irregular Barnes-Hut algorithm.
(Artur Mariano, Ricardo Alves, Joao Barbosa, Luis Paulo Santos and Alberto Proenca: “A (ir)regularity-aware task scheduler for heterogeneous platforms”, Proceedings of the 2nd International Conference on High Performance Computing, Kiev, October 2012, pp 45-56,. [PDF])
November 8th, 2012
Supercomputing luminaries and experts like Jack Dongarra and Takayuki Aoki will be presenting in NVIDIA’s GPU Technology Theater at SC12. Talks will happen every 30 minutes and will also be webcast live with interactive Q&A on NVIDIA’s website. For the complete lineup of science and developer talks visit http://www.nvidia.com/object/sc12-technology-theater.html. SC12 takes place Nov. 10-16 in Salt Lake City, Utah.
November 8th, 2012
We describe an extension of our graphics processing unit (GPU) electronic structure program TeraChem to include atom-centered Gaussian basis sets with d angular momentum functions. This was made possible by a “meta-programming” strategy that leverages computer algebra systems for the derivation of equations and their transformation to correct code. We generate a multitude of code fragments that are formally mathematically equivalent, but differ in their memory and floating-point operation footprints. We then select between different code fragments using empirical testing to find the highest performing code variant. This leads to an optimal balance of floating-point operations and memory bandwidth for a given target architecture without laborious manual tuning. We show that this approach is capable of similar performance compared to our hand-tuned GPU kernels for basis sets with s and p angular momenta. We also demonstrate that mixed precision schemes (using both single and double precision) remain stable and accurate for molecules with d functions. We provide benchmarks of the execution time of entire self-consistent field (SCF) calculations using our GPU code and compare to mature CPU based codes, showing the benefits of the GPU architecture for electronic structure theory with appropriately redesigned algorithms. We suggest that the meta-programming and empirical performance optimization approach may be important in future computational chemistry applications, especially in the face of quickly evolving computer architectures.
(Alexey V Titov , Ivan S. Ufimtsev , Nathan Luehr and Todd J. Martínez: “Generating Efficient Quantum Chemistry Codes for Novel Architectures”, accepted for publication in the Journal of Chemical Theory and Computation, 2012. [DOI])
November 6th, 2012
The 21st High Performance Computing Symposium (HPC 2013), devoted to the impact of high performance computing and communications on computer simulations. Advances in multicore and many-core architectures, networking, high end computers, large data stores, and middleware capabilities are ushering in a new era of high performance parallel and distributed simulations. Along with these new capabilities come new challenges in computing and system modeling. The goal of HPC 2013 is to encourage innovation in high performance computing
and communication technologies and to promote synergistic advances in modeling methodologies and simulation. It will promote the exchange of ideas and information between universities, industry, and national laboratories about new developments in system modeling, high performance computing and communication, and scientific computing and simulation. Read the rest of this entry »
November 6th, 2012
The Sixth Workshop on General Purpose Processing Using GPUs (GPGPU6) is held in conjunction with ASPLOS XVIII, Houston, TX, March 17, 2013.
Overview: The goal of this workshop is to provide a forum to discuss new and emerging general-purpose purpose programming environments and platforms, as well as evaluate applications that have been able to harness the horsepower provided by these platforms. This year’s work is particularly interested on new heterogeneous GPU platforms. Papers are being sought on many aspects of GPUs, including (but not limited to):
- GPU applications + GPU compilation
- GPU programming environments + GPU power/efficiency
- GPU architectures + GPU benchmarking/measurements
- Multi-GPU systems + Heterogeneous GPU platforms
Submission Information: Authors should submit their papers using the ACM SIG Proceedings format in double-column style using the directions on the conference website at http://www.ece.neu.edu/groups/nucar/GPGPU/GPGPU6. Submitted papers will be evaluated based on originality, significance to topics, technical soundness, and presentation quality. At least one author must register and attend GPGPU to present the work. Accepted papers will be included in preliminary proceedings and distributed at the event. All papers will be made available at the workshop and will also be published in the ACM Conference Proceedings Series.
November 3rd, 2012
We’re looking for novel or interesting research topics in GPU computing, computer graphics, cloud graphics, game development, and applications of GPUs. We strongly encourage international attendees to submit early in order to receive notifications in time for US visa deadlines. Learn more at http://www.gputechconf.com/page/home.html.
October 30th, 2012
OpenCL CodeBench is a code creation and productivity tools suite designed to accelerate and simplify OpenCL software development. OpenCL CodeBench provides developers with automation tools for host code and unit test bench generation. Kernel code development on OpenCL is accelerated and enhanced through a language aware editor delivering advanced incremental code analysis features. Software Programmers new to OpenCL can choose to be guided through an Eclipse wizard, while the power users can leverage the command line interface with XML-based configuration files. OpenCL CodeBench Beta is now available for Linux and Windows operating systems.
October 26th, 2012
We present new format for storing sparse matrices on GPU. We compare it with several other formats including CUSPARSE which is today probably the best choice for processing of sparse matrices on GPU in CUDA. Contrary to CUSPARSE which works with common CSR format, our new format requires conversion. However, multiplication of sparse-matrix and vector is significantly faster for many matrices. We demonstrate it on set of 1 600 matrices and we show for what types of matrices our format is profitable.
(Heller M., Oberhuber T.: “Improved Row-grouped CSR Format for Storing of Sparse Matrices on GPU”, Proceedings of Algoritmy 2012, 2012, Handlovičová A., Minarechová Z. and Ševčovič D. (ed.), pages 282-290, ISBN 978-80-227-3742-5) [ARXIV preprint]
October 26th, 2012
Jacket enables GPU computing for MATLAB® codes. The new version v2.3 includes performance improvements and new support for CUDA 5.0. This newer version of CUDA enables computation on the latest Kepler K20 GPUs of the NVIDIA Tesla product line.
More information: http://blog.accelereyes.com/blog/2012/10/23/jacket-v2-3/
GPUs have become a corner stone of computational research in high performance computing with over 200 commonly used applications already GPU-enabled. Researchers across many domains, such as Computational Chemistry, Biology, Weather & Climate, and Engineering, are using GPU-accelerated applications to greatly reduce time to discovery by achieving results that were simply not possible before.
Join Devang Sachdev, Sr. Product Manager, NVIDIA for an overview of the most popular applications used in academic research and an account of success stories enabled by GPUs. Learn also about a complimentary program which allows researchers to easily try GPU-accelerated applications on a remotely hosted cluster or Amazon AWS cloud.
Register at http://www.gputechconf.com/page/gtc-express-webinar.html.