May 18th, 2015
May 18th, 2015
As both CPU and GPU become employed in a wide range of applications, it has been acknowledged that both of these processing units (PUs) have their unique features and strengths and hence, CPU-GPU collaboration is inevitable to achieve high-performance computing. This has motivated significant amount of research on heterogeneous computing techniques, along with the design of CPU-GPU fused chips and petascale heterogeneous supercomputers. In this paper, we survey heterogeneous computing techniques (HCTs) such as workload-partitioning which enable utilizing both CPU and GPU to improve performance and/or energy efficiency. We review heterogeneous computing approaches at runtime, algorithm, programming, compiler and application level. Further, we review both discrete and fused CPU-GPU systems; and discuss benchmark suites designed for evaluating heterogeneous computing systems (HCSs). We believe that this paper will provide insights into working and scope of applications of HCTs to researchers and motivate them to further harness the computational powers of CPUs and GPUs to achieve the goal of exascale performance.
Sparsh Mittal and Jeffrey Vetter, “A Survey of CPU-GPU Heterogeneous Computing Techniques”, accepted in ACM Computing Surveys, 2015. WWW
April 22nd, 2015
Geometric Algebra is a new, geometrically intuitive mathematical system. It provides very easy algorithms for many application areas such as computer graphics, computer vision, robotics and computer simulations. The HSA Foundation (Heterogeneous System Architecture Foundation) is a not-for-profit industry standards body founded by companies such as AMD, ARM Samsung and Texas Instruments and focused on making it dramatically easier to program heterogeneous computing devices such as GPUs.
Since Gaalop (Geometric algebra algorithms optimizer) is focusing exactly on the optimization and integration of Geometric Algebra in these kind of new parallel computing architectures, this technology together with the new Kalmar C++ AMP compiler provides a solution for Math, Science & Engineering for HSA.
September 12th, 2014
Stanford, CA – 21 April 2015. The organisers of IWOCL (“eye-wok-ul”), the International Workshop on OpenCL, today announced that AMD and HP have sponsored the Advanced Hands-On OpenCL Tutorial that will kick-off IWOCL 2015. The tutorial, which will focus on advanced OpenCL concepts, is an extension of the highly successful ‘Hands on OpenCL’ course which has received over 3,000 downloads. Simon McIntosh-Smith, Senior Lecturer in High Performance Computing and Architectures at the University of Bristol and one of the authors of the original open-source course will lead the tutorial.
The full-day Advanced Hands-On OpenCL tutorial takes place on Monday 11th May at the Li Ka Shing Center, Stanford University. Registration is $145. For additional information visit: http://www.iwocl.org/conf-2015/handsonopencl-tutorial/ Read the rest of this entry »
August 20th, 2014
This tutorial will begin with a brief overview of OpenCL and data-parallelism before focusing on the GPU programming model. We will explore the fundamentals of GPU kernels, host and device responsibilities, OpenCL syntax and work-item hierarchy. For more information and to register visit: http://acceleware.com/event/introduction-opencl-using-amd-gpus
February 26th, 2014
In the course of less than a decade, Graphics Processing Units (GPUs) have evolved from narrowly scoped application specific accelerators to general-purpose parallel machines capable of accommodating an ever-growing set of algorithms. At the same time, programming GPUs appears to have become trapped around an attractor characterised by ad-hoc practices, non-portable implementations and inexact, uninformative performance reporting. The purpose of this paper is two-fold, on one hand pursuing an in-depth look at GPU hardware and its characteristics, and on the other demonstrating that portable, generic, mathematically grounded programming of these machines is possible and desirable. An agent-based meta-heuristic, the Max-Min Ant System (MMAS), provides the context. The major contributions brought about by this article are the following: (1) an optimal, portable, generic-algorithm based MMAS implementation is derived; (2) an in-depth analysis of AMD’s Graphics Core Next (GCN) GPU and the C++ AMP programming model is supplied; (3) a more robust approach to performance reporting is presented; (4) novel techniques for raising the abstraction level without sacrificing performance are employed. This represents the first implementation of an algorithm from the Ant Colony Optimisation (ACO) family using C++ AMP, whilst at the same time being one of the first uses of the latter programming environment.
(A. Voicu: “Accelerated Combinatorial Optimization using Graphics Processing Units and C++ AMP ”. International Journal of Computer Applications 100(6):21-30, August 2014. [DOI])
November 13th, 2013
On March 5 at 11:00am (PST), Acceleware hosts a webinar on accelerating a seismic algorithm on a cluster of AMD GPU compute nodes. The presentation will begin with an outline of the full waveform inversion (FWI) algorithm, followed by an introduction to OpenCL. The OpenCL programming model and memory spaces will be introduced. Strategies for formulating the problem to take advantage of the massively parallel GPU architecture, and key optimizations techniques are discussed including coalescing and an iterative approach to handle the slices. Performance results for the GPU are compared to the CPU run times. Click here to register.
July 14th, 2013
AMD CodeXL is a free set of tools for GPU debugging, GPU profiling, static analysis of OpenCL kernels, and CPU profiling, including support for remote servers. For more information and download links, see: http://developer.amd.com/community/blog/2013/11/08/codexl-1-3-released/
Bolt is an STL compatible C++ template library for creating data-parallel applications using C++ (no C++ AMP / OpenCL code required). For more information about the Bolt template library and download links, see: http://developer.amd.com/tools-and-sdks/heterogeneous-computing/amd-accelerated-parallel-processing-app-sdk/bolt-c-template-library/
AMD APP SDK has everything needed to get started with OpenCL and parallel programming. It includes OpenCL samples that are very easy to compile, as well as the Bolt and other libraries. For more information about AMD APP SDK and download links, see: http://developer.amd.com/tools-and-sdks/heterogeneous-computing/amd-accelerated-parallel-processing-app-sdk/
February 25th, 2013
From a recent press release:
AMD’s APP SDK is an essential resource for developers who wish to leverage the processing power of heterogeneous computing. OpenCL™ is the primary mechanism for achieving this today, but AMD’s goal is to enable developers to accelerate applications with the programming paradigm of their choice. Toward that end, AMD has added support for heterogeneous libraries such as the newly released Bolt open source C++ template library and OpenCV computer vision library which now includes heterogeneous acceleration.
New to APP SDK 2.8.1:
Bolt: With the recent launch of Bolt 1.0, AMD has added several samples to the APP SDK to demonstrate Bolt 1.0 features. These showcase the usage of Bolt APIs such as scan, sort, reduce and transform. Other new samples highlight the ease of porting from STL and the performance benefits achieved over equivalent STL implementations. We’ve also included samples to demonstrate the different fallback options available in Bolt 1.0 when no GPU is available which ensure your code runs correctly on any platform.
OpenCV: AMD has been working closely with the OpenCV open source community to add heterogeneous acceleration capability to the world’s most popular computer vision library. These changes are already integrated into OpenCV and are readily available for developers who want to improve performance and efficiency of their computer vision applications. AMD has included samples to illustrate these improvements and highlight how simple it is to include them in your app.
GCN: AMD recently launched its new Graphics Core Next (GCN) architecture on several AMD products. GCN is based on a scalar architecture vs. the VLIW vector architecture of prior generations, so hand-tuned vectorization to optimize hardware utilization is no longer needed. We’ve modified several samples in AMD APP SDK 2.8.1 to show the ease of writing scalar code as compared to vectorization.
For more information, see developer.amd.com.
January 29th, 2013
Calling all software development innovators in general purpose GPU (GPGPU), data parallel and heterogeneous computing. On November 11-14, 2013 AMD will host the AMD 2013 Developer Summit in San Jose California. The AMD Developer Summit conference board has issued a call for presentation proposals, inviting creators of next-generation software to share research and development work through presentations based on the latest technical papers or reports.
The AMD Developer Summit will be a great venue for developers, academics and innovative entrepreneurs to network with others engaged in related work, collectively defining the future course of heterogeneous computing. And delivering a presentation offers you the perfect opportunity to advocate programming paradigms or gain support for industry standards.
The submission deadline is Mar. 15, 2013, and the submission website is available at: https://www.easychair.org/conferences/?conf=ads2013
AccelerEyes has released dates for their upcoming CUDA and OpenCL training courses.
More information can be found on the courses’ webpages.