In the course of less than a decade, Graphics Processing Units (GPUs) have evolved from narrowly scoped application specific accelerators to general-purpose parallel machines capable of accommodating an ever-growing set of algorithms. At the same time, programming GPUs appears to have become trapped around an attractor characterised by ad-hoc practices, non-portable implementations and inexact, uninformative performance reporting. The purpose of this paper is two-fold, on one hand pursuing an in-depth look at GPU hardware and its characteristics, and on the other demonstrating that portable, generic, mathematically grounded programming of these machines is possible and desirable. An agent-based meta-heuristic, the Max-Min Ant System (MMAS), provides the context. The major contributions brought about by this article are the following: (1) an optimal, portable, generic-algorithm based MMAS implementation is derived; (2) an in-depth analysis of AMD’s Graphics Core Next (GCN) GPU and the C++ AMP programming model is supplied; (3) a more robust approach to performance reporting is presented; (4) novel techniques for raising the abstraction level without sacrificing performance are employed. This represents the first implementation of an algorithm from the Ant Colony Optimisation (ACO) family using C++ AMP, whilst at the same time being one of the first uses of the latter programming environment.
(A. Voicu: “Accelerated Combinatorial Optimization using Graphics Processing Units and C++ AMP ”. International Journal of Computer Applications 100(6):21-30, August 2014. [DOI])
This hands-on four day course teaches how to write and optimize applications that fully leverage the multi-core processing capabilities of the GPU. More details and registration: http://acceleware.com/training/986
Hybrid Fortran is an Open Source directive based extension for the Fortran language. It is a way for HPC programmers to keep writing Fortran code like they are used to – only now with GPGPU support. It achieves performance portability by allowing different storage orders and loop structures for the CPU and GPU version. All computational code stays the same as in the respective CPU version, e.g. it can be kept in a low dimensionality even when the GPU version needs to be privatised in more dimensions in order to achieve a speedup. Hybrid Fortran takes care of the necessary transformations at compile-time (so there is no runtime overhead). A (python based) preprocessor parses these annotations together with the Fortran user code structure, declarations, accessors and procedure calls, and then writes separate versions of the code – once for CPU with OpenMP parallelization and once for GPU with CUDA Fortran. More details: http://typhooncomputing.com/?p=416
The course on Antenna Synthesis (with elements of GPU computing) is organized in the framework of the European School of Antennas. The course will take place at the Partenope Conference Center of the Università di Napoli Federico II, Napoli, Italy, on October 13-17, 2014. It faces three topics corresponding to the two main aspects of Antenna Synthesis, namely external and internal synthesis, and to numerical and implementation issues on High Performance Computing (HPC) platforms of synthesis algorithms. For details about the course please see this brochure and http://www.antennasvce.org/Community/Education/Courses?id_folder=533.
This blog entry provides an introduction to GPU virtualization, reviewing the five major technology vendors and their virtualization support for CUDA.
“GPU Algorithms for Image Processing and Computer Vision”, to be published by Springer, will contain a collection of articles on fundamental image processing and computer vision methods adapted for Graphics Processing Units (GPUs). In recent years, substantial efforts were undertaken to adapt many such algorithms for massively-parallel GPU-based systems. The book is envisioned as a consolidation of such work into a single volume covering widely used methods and techniques. Each chapter will be written by authors working on a specific group of methods. It will provide mathematical background, parallel algorithm, and implementation details leading to reusable, adaptable, and scalable code fragments. The book will serve as a GPU implementation manual for many image processing and analysis algorithms providing valuable insights into parallelization strategies for GPUs as well as ready-to-use code fragments with a broad appeal to both developers and researchers interested in GPU computing. Read the rest of this entry »
A new book titled “Numerical Computations with GPUs” has been published:
This book brings together research on numerical methods adapted for Graphics Processing Units (GPUs). It explains recent efforts to adapt classic numerical methods, including solution of linear equations and FFT, for massively parallel GPU architectures. This volume consolidates recent research and adaptations, covering widely used methods that are at the core of many scientific and engineering computations. Each chapter is written by authors working on a specific group of methods; these leading experts provide mathematical background, parallel algorithms and implementation details leading to reusable, adaptable and scalable code fragments. This book also serves as a GPU implementation manual for many numerical algorithms, sharing tips on GPUs that can increase application efficiency. The valuable insights into parallelization strategies for GPUs are supplemented by ready-to-use code fragments. Numerical Computations with GPUs targets professionals and researchers working in high performance computing and GPU programming. Advanced-level students focused on computer science and mathematics will also find this book useful as secondary text book or reference.
From the table of contents: Read the rest of this entry »
Partnering with NVIDIA, this four day CUDA training course, held in Houston is designed for programmers in the oil and gas industry who are looking to develop comprehensive skills in writing and optimizing applications that fully leverage the many-core processing capabilities of the GPU. Commonly used algorithms such as filtering and FFTs will be used and profiled in the examples. The case study on day 4 focuses on efficient implementation of a finite difference algorithm which is highly applicable to reverse time migration. However a background in oil and gas is not necessary. For more information and to view a copy of the course outline please visit: http://acceleware.com/training/987
Boost.Compute is a header-only C++ library for GPGPU and parallel-computing based on OpenCL. It provides a low-level C++ wrapper over OpenCL and high-level STL-like API with containers and algorithms for the GPU. It is available on GitHub and instructions for getting started can be found in the documentation. See the full announcement here: http://kylelutz.blogspot.com/2014/07/boost-compute-v0.3-released.html
A new version of the rCUDA middleware has been released (version 4.2). In addition to fix some minor bugs, the new release provides support for:
- CUDA 6.0 Runtime API
- New stream management
- cuSPARSE libraries
The rCUDA middleware allows to seamlessly use, within your cluster, GPUs that are installed in computing nodes different from the one that is executing the CUDA application, without requiring to modify your program. Please visit www.rcuda.net for more details about the rCUDA technology.