CUDA and OpenCL Training Courses 2015

June 11th, 2015

“Acceleware offers industry leading training courses for software developers looking to increase their skills in writing or optimizing applications for highly parallel processing. The training focuses on using GPUs for computing and the associated popular programming languages.

The courses are all taught by experienced programmers who provide real world experience, derived from Acceleware’s 9 years of building commercial GPU applications.

Clients will access our top rated training techniques for parallel programming.

We offer public and private courses (private courses require a minimum of 6 students). The 2015 training schedule for public courses is posted on the Acceleware website.”
Acceleware’s Training Courses 2015

Android OpenCL tool builds automatically searchable CL capabilities database

June 11th, 2015

At the moment Google does not support OpenCL™ as part of the Android platform. However newer generation devices do support it. But not all devices are equipped with the right drivers.

More and more device manufacturers include these drivers as OpenCL™ can be very useful to accelerate specific workloads. The goal of this tool is to build a database of all OpenCL™ capable devices and its properties so developer/users can search though this data. This enables them to see how many devices have OpenCL™ support and what features are implemented. It enables a developer to decide if it make sense for them to utilize OpenCL™ to accelerate their application.

With the tool it is possible to browse through the database and see all devices that support OpenCL. Next to the it is possible to view all the OpenCL capabilities of your current device and all the devices in the on-line database. Read the rest of this entry »

Gaalop Geometric Algebra Library for HSA

May 18th, 2015

Geometric Algebra is a new, geometrically intuitive mathematical system. It provides very easy algorithms for many application areas such as computer graphics, computer vision, robotics and computer simulations. The HSA Foundation (Heterogeneous System Architecture Foundation) is a not-for-profit industry standards body founded by companies such as AMD, ARM Samsung and Texas Instruments and focused on making it dramatically easier to program heterogeneous computing devices such as GPUs.

Since Gaalop (Geometric algebra algorithms optimizer) is focusing exactly on the optimization and integration of Geometric Algebra in these kind of new parallel computing architectures, this technology together with the new Kalmar C++ AMP compiler provides a solution for Math, Science & Engineering for HSA.

OpenCL-Z Android

April 22nd, 2015

Developers have been using utility tools such as CPU-Z, GPU-Z, CUDA-Z, OpenCL-Z for a long time. These tools provide platform and hardware information in details and help developers quickly understand the hardware capabilities.

Recently, OpenCL has been supported by most of the latest mobile phones/tablets, as the mobile GPUs are gaining more compute power. OpenCL-A Android can help developer to quickly detect the availability of the OpenCL on a device, and get information about OpenCL-capable platform and devices.

In addition to detecting the OpenCL capability and getting device information, the OpenCL-Z Android is also able to measure the raw compute power in terms of ALU peak GFLOPS performance and memory bandwidth performance. These numbers would be useful for developers who want to take advantage of GPU compute capability of the modern GPU. The developers can roughly predict the performance of a certain algorithm targeting on a specific platform, or compare the raw compute performance among platforms.

The OpenCL-Z Android is a free software and it is now available on Google Play:
Download link at Google Play

The major features of OpenCL-Z Android:
– detect OpenCL availability;
– detect OpenCL driver library;
– display detailed OpenCL platform information;
– display detailed OpenCL device information;
– measure the raw compute performance and memory system bandwidth;
– export OpenCL information to sdcard;
– share OpenCL information with other applications, such as e-mail clients, note applications, social media and so on.

The OpenCL-Z Android has been tested on mobile devices with Qualcomm Snapdragon 8064, 8974, 8084, 8994 chipsets (with Adreno 305, 320, 330, 420, 430 GPUs), Samsung Exynos 5420, 5433 chipsets (with Mali T628, T760 GPUs), MediaTek MT6752 chipset (with Mali T760 GPU), Rockchip RK3288 (with Mali T764 GPU).

The OpenCL-Z Android should be able to support other chipsets. If your device is known to have OpenCL support, but this tool fails to detect it, please contact the developer of OpenCL-Z.

The author of OpenCL-Z is also trying to create a relatively complete list of mobile devices that support OpenCL, the list can be found at the OpenCL-Z official website . If you see any device supporting OpenCL not on that list, please send the author an email and help the list grow.

RapidCFD: open-source CFD for GPUs

April 13th, 2015

A new open-source CFD project have just been published. RapidCFD is a new open-source CFD project that uses NVIDIA CUDA for the entire calculation process which gives a significant reduction in computation time.


  • most incompressible and compressible solvers on static mesh are available
  • all the calculations are done on the GPU
  • no overhead for GPU-CPU memory copy
  • can run in parallel on multiple GPUs

Visit RapidCFD project page.

PARALUTION Release 1.0

April 13th, 2015

PARALUTION is a library for sparse iterative methods which can be performed on various parallel devices, including multi-core CPU, GPU (CUDA and OpenCL) and Intel Xeon Phi.

The 1.0 version of the PARALUTION Library supports multi-node and multi-GPU configuration via MPI. All iterative solvers support global operations (i.e. distributed matrices and vectors) and all preconditioners can be used in a block-Jacobi fashion locally on each node/GPU. In addition, the software provides a global (fully distributed) Pair-Wise AMG solver. Read the rest of this entry »

MAPS: Optimizing Massively Parallel Applications Using Device-Level Memory Abstraction

February 11th, 2015


GPUs play an increasingly important role in high-performance computing. While developing naive code is straightforward, optimizing massively parallel applications requires deep understanding of the underlying architecture. The developer must struggle with complex index calculations and manual memory transfers. This article classifies memory access patterns used in most parallel algorithms, based on Berkeley’s Parallel “Dwarfs.” It then proposes the MAPS framework, a device-level memory abstraction that facilitates memory access on GPUs, alleviating complex indexing using on-device containers and iterators. This article presents an implementation of MAPS and shows that its performance is comparable to carefully optimized implementations of real-world applications.

Rubin, Eri, et al. ["MAPS: Optimizing Massively Parallel Applications Using Device-Level Memory Abstraction."]( ACM Transactions on Architecture and Code Optimization (TACO) 11.4 (2014): 44.

[Library website](

C Framework for OpenCL v2.0.0 Now Available

February 11th, 2015

After four pre-releases, the stable 2.0.0 version of cf4ocl, the C Framework for OpenCL, is now available.

Since the last beta release, a number of tests were added, and a few bug fixes have been fixed. Support for device fission and native kernels has also been implemented. A complete list of features and fixes is available at

Cf4ocl has been tested on Linux, OS X and Windows, and offers a pure C object-oriented framework for developing and benchmarking OpenCL projects in C. It aims to:

1. Promote the rapid development of OpenCL host programs in C (with support for C++) and avoid the tedious and error-prone boilerplate code usually required. Read the rest of this entry »

Boost.Compute v0.4 Released

December 27th, 2014

Boost.Compute is an open-source, header-only C++ library for GPGPU and parallel-computing based on OpenCL. It provides a low-level C++ wrapper over OpenCL and high-level STL-like API with containers and algorithms for the GPU. Boost.Compute is available on GitHub and its documentation can be found here. See the full announcement here:

PARALUTION v0.8.0 released

November 14th, 2014

PARALUTION is a library for sparse iterative methods which can be performed on various parallel devices, including multi-core CPU, GPU (CUDA and OpenCL) and Intel Xeon Phi. The new 0.8.0 release provides the following extra features:

  • Complex support
  • TNS, Variable preconditioner
  • BiCGStab(l), QMRCGStab, FCG solvers
  • RS and PairWise AMG
  • SIRA eigenvalue solver
  • Replace/Extract column/row functions
  • Stencil computation

For details, visit

Page 1 of 4212345...102030...Last »