Webinar Sep. 17: An Introduction to OpenCL using AMD GPUs

September 12th, 2014

This tutorial will begin with a brief overview of OpenCL and data-parallelism before focusing on the GPU programming model. We will explore the fundamentals of GPU kernels, host and device responsibilities, OpenCL syntax and work-item hierarchy. For more information and to register visit: http://acceleware.com/event/introduction-opencl-using-amd-gpus

Accelerated Combinatorial Optimization using Graphics Processing Units and C++ AMP

August 20th, 2014


In the course of less than a decade, Graphics Processing Units (GPUs) have evolved from narrowly scoped application specific accelerators to general-purpose parallel machines capable of accommodating an ever-growing set of algorithms. At the same time, programming GPUs appears to have become trapped around an attractor characterised by ad-hoc practices, non-portable implementations and inexact, uninformative performance reporting. The purpose of this paper is two-fold, on one hand pursuing an in-depth look at GPU hardware and its characteristics, and on the other demonstrating that portable, generic, mathematically grounded programming of these machines is possible and desirable. An agent-based meta-heuristic, the Max-Min Ant System (MMAS), provides the context. The major contributions brought about by this article are the following: (1) an optimal, portable, generic-algorithm based MMAS implementation is derived; (2) an in-depth analysis of AMD’s Graphics Core Next (GCN) GPU and the C++ AMP programming model is supplied; (3) a more robust approach to performance reporting is presented; (4) novel techniques for raising the abstraction level without sacrificing performance are employed. This represents the first implementation of an algorithm from the Ant Colony Optimisation (ACO) family using C++ AMP, whilst at the same time being one of the first uses of the latter programming environment.

(A. Voicu: “Accelerated Combinatorial Optimization using Graphics Processing Units and C++ AMP ”. International Journal of Computer Applications 100(6):21-30, August 2014. [DOI])

Webinar: Accelerating Full Waveform Inversion via OpenCL on AMD GPUs

February 26th, 2014

On March 5 at 11:00am (PST), Acceleware hosts a webinar on accelerating a seismic algorithm on a cluster of AMD GPU compute nodes. The presentation will begin with an outline of the full waveform inversion (FWI) algorithm, followed by an introduction to OpenCL. The OpenCL programming model and memory spaces will be introduced. Strategies for formulating the problem to take advantage of the massively parallel GPU architecture, and key optimizations techniques are discussed including coalescing and an iterative approach to handle the slices. Performance results for the GPU are compared to the CPU run times. Click here to register.

New Versions of AMD CodeXL, Bolt and AMD APP SDK

November 13th, 2013

AMD CodeXL is a free set of tools for GPU debugging, GPU profiling, static analysis of OpenCL kernels, and CPU profiling, including support for remote servers. For more information and download links, see: http://developer.amd.com/community/blog/2013/11/08/codexl-1-3-released/

Bolt is an STL compatible C++ template library for creating data-parallel applications using C++ (no C++ AMP / OpenCL code required). For more information about the Bolt template library and download links, see: http://developer.amd.com/tools-and-sdks/heterogeneous-computing/amd-accelerated-parallel-processing-app-sdk/bolt-c-template-library/

AMD APP SDK has everything needed to get started with OpenCL and parallel programming. It includes OpenCL samples that are very easy to compile, as well as the Bolt and other libraries. For more information about AMD APP SDK and download links, see: http://developer.amd.com/tools-and-sdks/heterogeneous-computing/amd-accelerated-parallel-processing-app-sdk/

AMD Releases APP SDK 2.8.1 with support for Bolt C++ Template Library, OpenCV, and GCN

July 14th, 2013

From a recent press release:

AMD’s APP SDK is an essential resource for developers who wish to leverage the processing power of heterogeneous computing. OpenCL™ is the primary mechanism for achieving this today, but AMD’s goal is to enable developers to accelerate applications with the programming paradigm of their choice. Toward that end, AMD has added support for heterogeneous libraries such as the newly released Bolt open source C++ template library and OpenCV computer vision library which now includes heterogeneous acceleration.

New to APP SDK 2.8.1:

Bolt: With the recent launch of Bolt 1.0, AMD has added several samples to the APP SDK to demonstrate Bolt 1.0 features. These showcase the usage of Bolt APIs such as scan, sort, reduce and transform. Other new samples highlight the ease of porting from STL and the performance benefits achieved over equivalent STL implementations. We’ve also included samples to demonstrate the different fallback options available in Bolt 1.0 when no GPU is available which ensure your code runs correctly on any platform.

OpenCV: AMD has been working closely with the OpenCV open source community to add heterogeneous acceleration capability to the world’s most popular computer vision library. These changes are already integrated into OpenCV and are readily available for developers who want to improve performance and efficiency of their computer vision applications. AMD has included samples to illustrate these improvements and highlight how simple it is to include them in your app.

GCN: AMD recently launched its new Graphics Core Next (GCN) architecture on several AMD products. GCN is based on a scalar architecture vs. the VLIW vector architecture of prior generations, so hand-tuned vectorization to optimize hardware utilization is no longer needed. We’ve modified several samples in AMD APP SDK 2.8.1 to show the ease of writing scalar code as compared to vectorization.

For more information, see developer.amd.com.

Call for Papers: AMD 2013 Developer Summit

February 25th, 2013

Calling all software development innovators in general purpose GPU (GPGPU), data parallel and heterogeneous computing. On November 11-14, 2013 AMD will host the AMD 2013 Developer Summit in San Jose California. The AMD Developer Summit conference board has issued a call for presentation proposals, inviting creators of next-generation software to share research and development work through presentations based on the latest technical papers or reports.

The AMD Developer Summit will be a great venue for developers, academics and innovative entrepreneurs to network with others engaged in related work, collectively defining the future course of heterogeneous computing. And delivering a presentation offers you the perfect opportunity to advocate programming paradigms or gain support for industry standards.

The submission deadline is Mar. 15, 2013, and the submission website is available at: https://www.easychair.org/conferences/?conf=ads2013

Parallel Computing Training Dates from AccelerEyes

January 29th, 2013

AccelerEyes has released dates for their upcoming CUDA and OpenCL training courses.



More information can be found on the courses’ webpages.

Acceleware parallel programming courses

January 25th, 2013

Acceleware has recently announced four courses on parallel programming:

More information is available on the courses’ webpages.

AMD CodeXL: comprehensive developer tool suite for heterogeneous compute

October 9th, 2012

AMD CodeXL is a new unified developer tool suite that enables developers to harness the benefits of CPUs, GPUs and APUs. It includes powerful GPU debugging, comprehensive GPU and CPU profiling, and static OpenCL™ kernel analysis capabilities, enhancing accessibility for software developers to enter the era of heterogeneous computing. AMD CodeXL is available for free, both as a Visual Studio® extension and a standalone user interface application for Windows® and Linux®.

AMD CodeXL increases developer productivity by helping them identify programming errors and performance issues in their application quickly and easily. Now developers can debug, profile and analyze their applications with a full system-wide view on AMD APU, GPU and CPUs.

AMD CodeXL user group (requires registration) allows users to interact with the CodeXL team, provide feedback, get support and participate in the beta surveys.

Implementing a code generator for fast matrix multiplication in OpenCL on the GPU

July 11th, 2012


This paper presents results of an implementation of code generator for fast general matrix multiply (GEMM) kernels. When a set of parameters is given, the code generator produces the corresponding GEMM kernel written in OpenCL. The produced kernels are optimized for high-performance implementation on GPUs from AMD. Access latencies to GPU global memory is the main drawback for high performance. This study shows that storing matrix data in a block-major layout increases the performance and stability of GEMM kernels. On the Tahiti GPU (Radeon HD 7970), our DGEMM (double-precision GEMM) and SGEMM (single-precision GEMM) kernels achieve the performance up to 848 GFlop/s (90% of the peak) and 2646 GFlop/s (70%), respectively.

(K. Matsumoto, N. Nakasato, S. G. Sedukhin: “Implementing a code generator for fast matrix multiplication in OpenCL on the GPU”, accepted for Special Session: Auto-Tuning for Multicore and GPU (ATMG), IEEE 6th International Symposium on Embedded Multicore SoCs (MCSoC-12), Sep. 2012. [PDF])

Page 1 of 41234