As both CPU and GPU become employed in a wide range of applications, it has been acknowledged that both of these processing units (PUs) have their unique features and strengths and hence, CPU-GPU collaboration is inevitable to achieve high-performance computing. This has motivated significant amount of research on heterogeneous computing techniques, along with the design of CPU-GPU fused chips and petascale heterogeneous supercomputers. In this paper, we survey heterogeneous computing techniques (HCTs) such as workload-partitioning which enable utilizing both CPU and GPU to improve performance and/or energy efficiency. We review heterogeneous computing approaches at runtime, algorithm, programming, compiler and application level. Further, we review both discrete and fused CPU-GPU systems; and discuss benchmark suites designed for evaluating heterogeneous computing systems (HCSs). We believe that this paper will provide insights into working and scope of applications of HCTs to researchers and motivate them to further harness the computational powers of CPUs and GPUs to achieve the goal of exascale performance.
Sparsh Mittal and Jeffrey Vetter, “A Survey of CPU-GPU Heterogeneous Computing Techniques”, accepted in ACM Computing Surveys, 2015. WWW
Developers have been using utility tools such as CPU-Z, GPU-Z, CUDA-Z, OpenCL-Z for a long time. These tools provide platform and hardware information in details and help developers quickly understand the hardware capabilities.
Recently, OpenCL has been supported by most of the latest mobile phones/tablets, as the mobile GPUs are gaining more compute power. OpenCL-A Android can help developer to quickly detect the availability of the OpenCL on a device, and get information about OpenCL-capable platform and devices.
In addition to detecting the OpenCL capability and getting device information, the OpenCL-Z Android is also able to measure the raw compute power in terms of ALU peak GFLOPS performance and memory bandwidth performance. These numbers would be useful for developers who want to take advantage of GPU compute capability of the modern GPU. The developers can roughly predict the performance of a certain algorithm targeting on a specific platform, or compare the raw compute performance among platforms.
The OpenCL-Z Android is a free software and it is now available on Google Play:
Download link at Google Play
The major features of OpenCL-Z Android:
– detect OpenCL availability;
– detect OpenCL driver library;
– display detailed OpenCL platform information;
– display detailed OpenCL device information;
– measure the raw compute performance and memory system bandwidth;
– export OpenCL information to sdcard;
– share OpenCL information with other applications, such as e-mail clients, note applications, social media and so on.
The OpenCL-Z Android has been tested on mobile devices with Qualcomm Snapdragon 8064, 8974, 8084, 8994 chipsets (with Adreno 305, 320, 330, 420, 430 GPUs), Samsung Exynos 5420, 5433 chipsets (with Mali T628, T760 GPUs), MediaTek MT6752 chipset (with Mali T760 GPU), Rockchip RK3288 (with Mali T764 GPU).
The OpenCL-Z Android should be able to support other chipsets. If your device is known to have OpenCL support, but this tool fails to detect it, please contact the developer of OpenCL-Z.
The author of OpenCL-Z is also trying to create a relatively complete list of mobile devices that support OpenCL, the list can be found at the OpenCL-Z official website . If you see any device supporting OpenCL not on that list, please send the author an email and help the list grow.
Stanford, CA – 21 April 2015. The organisers of IWOCL (“eye-wok-ul”), the International Workshop on OpenCL, today announced that AMD and HP have sponsored the Advanced Hands-On OpenCL Tutorial that will kick-off IWOCL 2015. The tutorial, which will focus on advanced OpenCL concepts, is an extension of the highly successful ‘Hands on OpenCL’ course which has received over 3,000 downloads. Simon McIntosh-Smith, Senior Lecturer in High Performance Computing and Architectures at the University of Bristol and one of the authors of the original open-source course will lead the tutorial.
The full-day Advanced Hands-On OpenCL tutorial takes place on Monday 11th May at the Li Ka Shing Center, Stanford University. Registration is $145. For additional information visit: http://www.iwocl.org/conf-2015/handsonopencl-tutorial/ Read the rest of this entry »
Acceleware’s next OpenCL course takes place in Calgary. This professional four day course is designed for programmers who are looking to develop comprehensive skills in writing and optimizing applications that fully leverage data parallel processing capabilities of GPUs. Register before May 12 if you would like to reserve a spot. To find out what the course includes visit:
Learn OpenCL in Calgary www.acceleware.com
PARALUTION is a library for sparse iterative methods which can be performed on various parallel devices, including multi-core CPU, GPU (CUDA and OpenCL) and Intel Xeon Phi.
The 1.0 version of the PARALUTION Library supports multi-node and multi-GPU configuration via MPI. All iterative solvers support global operations (i.e. distributed matrices and vectors) and all preconditioners can be used in a block-Jacobi fashion locally on each node/GPU. In addition, the software provides a global (fully distributed) Pair-Wise AMG solver. Read the rest of this entry »
After four pre-releases, the stable 2.0.0 version of cf4ocl, the C Framework for OpenCL, is now available.
Since the last beta release, a number of tests were added, and a few bug fixes have been fixed. Support for device fission and native kernels has also been implemented. A complete list of features and fixes is available at https://github.com/FakenMC/cf4ocl/releases.
Cf4ocl has been tested on Linux, OS X and Windows, and offers a pure C object-oriented framework for developing and benchmarking OpenCL projects in C. It aims to:
1. Promote the rapid development of OpenCL host programs in C (with support for C++) and avoid the tedious and error-prone boilerplate code usually required. Read the rest of this entry »
Boost.Compute is an open-source, header-only C++ library for GPGPU and parallel-computing based on OpenCL. It provides a low-level C++ wrapper over OpenCL and high-level STL-like API with containers and algorithms for the GPU. Boost.Compute is available on GitHub and its documentation can be found here. See the full announcement here: http://kylelutz.blogspot.com/2014/12/boost-compute-0.4-released.html
PARALUTION is a library for sparse iterative methods which can be performed on various parallel devices, including multi-core CPU, GPU (CUDA and OpenCL) and Intel Xeon Phi. The new 0.8.0 release provides the following extra features:
- Complex support
- TNS, Variable preconditioner
- BiCGStab(l), QMRCGStab, FCG solvers
- RS and PairWise AMG
- SIRA eigenvalue solver
- Replace/Extract column/row functions
- Stencil computation
For details, visit http://www.paralution.com.
The Cf4ocl project is a GPLv3/LGPLv3 initiative to provide an object-oriented interface to the OpenCL C API with integrated profiling, promoting the rapid development of OpenCL host programs and avoiding boilerplate code. Its main goal is to allow developers to focus on OpenCL device code. After two alpha releases, the first beta is out, and can be tested on Linux, Windows and OS X. The framework is independent of the OpenCL platform version and vendor, and includes utilities to simplify the analysis of the OpenCL environment and of kernel requirements. While the project is making progress, it doesn’t yet offer OpenGL/DirectX interoperability, support for sub-devices, and doesn’t support pipes and SVM.
Cf4ocl can be downloaded from http://fakenmc.github.io/cf4ocl/.
Version 2.0 of OpenCLIPP, an Open Source OpenCL library for computer vision and image processing primitives, bas been released. For more information about the library, for programming contributions and for download, please refer to the OpenCLIPP Website.