The HSA Foundation will be hosting a Birds of a Feather session on heterogeneous computing on July 24 from 1-2 p.m., at the Anaheim Convention Center, Room 202B. For more info: http://slidesha.re/16JSqK7
This paper addresses the design, implementation and validation of an effective scheduling scheme for both regular and irregular applications on heterogeneous platforms. The scheduler uses an empirical performance model to dynamically schedule the workload, organized into a given number of chunks, and follows the Heterogeneous Earliest Finish Time (HEFT) scheduling algorithm, which ranks the tasks based on both their computation and communication costs. The evaluation of the proposed approach is based on three case studies – the SAXPY, the FFT and the Barnes-Hut algorithms – two regular and one irregular application. The scheduler was evaluated on a heterogeneous platform with one quad-core CPU-chip accelerated by one or two GPU devices, embedded in the GAMA framework. The evaluation runs measured the effectiveness, the efficiency and the scalability of the proposed method. Results show that the proposed model was effective in addressing both regular and irregular applications, on heterogeneous platforms, while achieving ideal (>=100%) levels of efficiency in the irregular Barnes-Hut algorithm.
(Artur Mariano, Ricardo Alves, Joao Barbosa, Luis Paulo Santos and Alberto Proenca: “A (ir)regularity-aware task scheduler for heterogeneous platforms”, Proceedings of the 2nd International Conference on High Performance Computing, Kiev, October 2012, pp 45-56,. [PDF])
AMD CodeXL is a new unified developer tool suite that enables developers to harness the benefits of CPUs, GPUs and APUs. It includes powerful GPU debugging, comprehensive GPU and CPU profiling, and static OpenCL™ kernel analysis capabilities, enhancing accessibility for software developers to enter the era of heterogeneous computing. AMD CodeXL is available for free, both as a Visual Studio® extension and a standalone user interface application for Windows® and Linux®.
AMD CodeXL increases developer productivity by helping them identify programming errors and performance issues in their application quickly and easily. Now developers can debug, profile and analyze their applications with a full system-wide view on AMD APU, GPU and CPUs.
AMD CodeXL user group (requires registration) allows users to interact with the CodeXL team, provide feedback, get support and participate in the beta surveys.
Version 3.0 of the MC# programming system has been released. MC# is an universal parallel programming language aimed to any parallel architecture – multicore processors, systems with GPU, or clusters. It is an extension of C# language and supports high-level parallel programming style.
The Intel® SDK for OpenCL Applications now supports the OpenCL 1.1 full-profile on 3rd generation Intel® Core™ processors with Intel® HD Graphics 4000/2500. For the first time, OpenCL developers using Intel® architecture can utilize compute resources across both Intel® Processor and Intel HD Graphics. More information: http://software.intel.com/en-us/articles/vcsource-tools-opencl-sdk
Libra Platform is a GPGPU-Heterogeneous Compute API and runtime environment available on Windows, Mac and Linux. Libra Compute API offers performance portability and direct compute access via standard programming environments C/C++, Java, C# and Matlab to execute math operations on top of current and future compute architectures, including the latest GPUs, x86/x64 CPUs and with broad support for compute devices compatible with low level specific APIs – OpenCL, CUDA, OpenGL and standard x86/x64 compute APIs.
Read more in the full announcement.
We fundamentally reconsider implementation of the Fast Multipole Method (FMM) on a computing node with a heterogeneous CPU-GPU architecture with multicore CPU(s) and one or more GPU accelerators, as well as on an interconnected cluster of such nodes. The FMM is a divide-and-conquer algorithm that performs a fast N-body sum using a spatial decomposition and is often used in a time-stepping or iterative loop. Using the observation that the local summation and the analysis-based translation parts of the FMM are independent, we map these respectively to the GPUs and CPUs. Careful analysis of the FMM is performed to distribute work optimally between the multicore CPUs and the GPU accelerators. We first develop a single node version where the CPU part is parallelized using OpenMP and the GPU version via CUDA. New parallel algorithms for creating FMM data structures are presented together with load balancing strategies for the single node and distributed multiple-node versions. Our 8 GPU performance
is comparable with performance of a 256 GPU version of the FMM that won the 2009 Bell prize.
(Qi Hu, Nail A. Gumerov and Ramani Duraswami: “Scalable fast multipole methods on distributed heterogeneous architectures”, accepted for SC’11. [PDF])
We are pleased to announce a three-day workshop on “Programming of Heterogeneous Systems in Physics”, a workshop to be held on 5-7 October 2011 at Friedrich-Schiller University, Jena, Germany. This workshop will focus on:
- Solving partial differential equations efficiently on the heterogeneous computing systems. There is some emphasis on GPU computing, but other accelerators and the efficient use of large multi-core cluster nodes are considered as well.
- Optimization of computational kernels coming from finite differences, spectral methods, and lattice gauge theory on accelerators.
- We plan to have a tutorial day, two days of talks and a poster session. We plan for discussion and talks to provide an overview of current work in these areas, and to develop future lines of research and collaborations. The deadline for submission of talks is 15 August 2011.
Please visit http://wwwsfb.tpi.uni-jena.de/Events/Event-PHSP11.shtml for more information. This workshop is organised by G. Zumbusch (Chair, Jena), B. Bruegmann (Jena), A. Weyhausen (Jena), L. Rezzolla (Potsdam) and B. Zink (Tuebingen).
This paper describes the approach and the speedup obtained in performing Smith-Waterman database searches on heterogeneous platforms comprising of multi core CPU and multi GPU systems. Most of the advanced and optimized Smith-Waterman algorithm versions have demonstrated remarkable speedup over NCBI BLAST versions, viz., SWPS3 based on x86 SSE2 instructions and CUDASW++ v2.0 CUDA implementation on GPU. This work proposes a hybrid Smith-Waterman algorithm that integrates the state-of-the art CPU and GPU solutions for accelerating Smith-Waterman algorithm in which GPU acts as a co-processor and shares the workload with the CPU enabling us to realize remarkable performance of over 70 GCUPS resulting from simultaneous CPU-GPU execution. In this work, both CPU and GPU are graded equally in performance for Smith-Waterman rather than previous approaches of porting the computationally intensive portions onto the GPUs or a naive multi-core CPU approach.
(J. Singh and I. Aruni: “Accelerating Smith-Waterman on Heterogeneous CPU-GPU Systems”, Proceedings of Bioinformatics and Biomedical Engineering (iCBBE), May 2011. [DOI])
Heterogeneous computing is moving into the mainstream, and a broader range of applications are already on the way. As the provider of world-class CPUs, GPUs, and APUs, AMD offers unique insight into these technologies and how they interoperate. We’ve been working with industry and academia partners to help advance real-world use of these technologies, and to understand the opportunities that lie ahead. It’s time to share what we’ve learned so far.
With tutorials, hands-on labs, and sessions that span a range of topics from HPC to multimedia, you’ll have the opportunity to expand your view of what heterogeneous computing currently offers and where it is going. You’ll hear from industry innovators and academic pioneers who are exploring different ways of approaching problems, and utilizing new paradigms in computing to help identify solutions. You’ll meet AMD experts with deep knowledge of hardware architectures and the software techniques that best leverage those platforms. And you’ll connect with other software professionals who share your passion for the future of technology.
Learn more at developer.amd.com/afds.