OpenCL Compiler Tools

October 19th, 2011

OCLTools is a powerful, yet compact, suite of Open Source tools that provide OpenCL developers with more alternatives to kernel compilation. OCLTools enables developers to eliminate costly kernel compilation time from the runtime of your application. With OCLTools developers can embed the source code of their kernels (clear text or encrypted) directly into their program binaries, eliminating the need to distribute kernel source code in the open while still maintaining the flexibility of runtime compilation. Both source code and precompiled binaries can be embedded into OpenCL binaries, effectively eliminating the additional kernel compilation overhead from the run time of your application.

For more information go to

CfP: 20th High Performance Computing Symposium 2012

October 7th, 2011

The 2012 Spring Simulation Multi-conference will feature the 20th High Performance Computing Symposium (HPC 2012), devoted to the impact of high performance computing and communications on computer simulations. Topics of interest include:

  • high performance/large scale application case studies,
  • GPUs for general purpose computations (GPGPU)
  • multicore and many-core computing,
  • power aware computing,
  • large scale visualization and data management,
  • tools and environments for coupling parallel codes,
  • parallel algorithms and architectures,
  • high performance software tools,
  • component technologies for high performance computing.

Important dates: Paper submission due: December 2, 2011; Notification of acceptance: January 13, 2012; Revised manuscript due: January 27, 2012; Symposium: March 26–29, 2012.

Sequence Homology Search using Fine-Grained Cycle Sharing of Idle GPUs

October 2nd, 2011


In this paper, we propose a fine-grained cycle sharing (FGCS) system capable of exploiting idle graphics processing units (GPUs) for accelerating sequence homology search in local area network environments. Our system exploits short idle periods on GPUs by running small parts of guest programs such that each part can be completed within hundreds of milliseconds. To detect such short idle periods from the pool of registered resources, our system continuously monitors keyboard and mouse activities via event handlers rather than waiting for a screensaver, as is typically deployed in existing systems. Our system also divides guest tasks into small parts according to a performance model that estimates execution times of the parts. This task division strategy minimizes any disruption to the owners of the GPU resources. Experimental results show that our FGCS system running on two non-dedicated GPUs achieves 111-116% of the throughput achieved by a single dedicated GPU. Furthermore, our system provides over two times the throughput of a screensaver-based system. We also show that the idle periods detected by our system constitute half of the system uptime. We believe that the GPUs hidden and often unused in office environments provide a powerful solution to sequence homology search.

(Fumihiko Ino, Yuma Munekawa, and Kenichi Hagihara, “Sequence Homology Search using Fine-Grained Cycle Sharing of Idle GPUs”, accepted for publication in IEEE Transactions on Parallel and Distributed Systems, Sep. 2011. [DOI])

2-day CUDA workshop in Berlin

September 24th, 2011

The second 2-day CUDA programming workshop in Berlin takes place November 5-6. Course details, outline and prices are available at

ofgpu v0.2 released: GPU linear solvers for OpenFOAM

September 24th, 2011

The latest release of Symscape’s ofgpu (v0.2) for OpenFOAM® 2.0.x is now available. ofgpu is an open source experimental linear solver library that targets NVIDIA CUDA GPU devices on Windows, Linux, and (untested) Mac OS X. ofgpu now has support for the Cusp preconditioners:

  • smoothed_aggregation – equivalent to Algebraic Multi-Grid (AMG)
  • scaled_bridson_ainv
  • bridson_ainv
  • nonsym_bridson_ainv

Also supported is the option to select the GPU device. For more details see

Aparapi – Parallel programming with Java and OpenCL

September 15th, 2011

AMD just released to open source a project called Aparapi that started in their JavaLabs team. Aparapi is an API for expressing data parallel workloads in Java and a runtime component capable of converting the Java bytecode of compatible workloads into OpenCL™ so that it can be executed on a variety of GPU devices.  More information can be found in this blog entry.

Thrust: A Productivity-Oriented Library for CUDA

September 12th, 2011


This chapter demonstrates how to leverage the Thrust parallel template library to implement high-performance applications with minimal programming effort. Based on the C++ Standard Template Library (STL), Thrust brings a familiar high-level interface to the realm of GPU Computing while remaining fully interoperable with the rest of the CUDA software ecosystem. Applications written with Thrust are concise, readable, and efficient.

(Nathan Bell and Jared Hoberock: “Thrust: A Productivity-Oriented Library for CUDA”, GPU Computing Gems, Jade Edition, edited by Wen-mei W. Hwu, October 2011)

An Analysis of the GPU Market

September 10th, 2011

From the abstract of a GPU market analysis whitepaper by John Peddie Research:

Computer graphics is hard work. Behind the images you see in games and movies, or while editing photos or video, some serious processing is taking place. All the processing power you can muster is needed to push and polish pixels. And this task is only going to get more demanding as these applications get more sophisticated. Graphics Processing Units (GPUs), which do the heavy lifting in computer graphics, range greatly in size, price and performance. They span from tiny cores inside an ARM processor (such as Nvidia’s Tegra or Qualcomm’s Snapdragon), to graphics integrated within an X86 processor (such as AMD’s Fusion, Intel’s Sandy Bridge), to a standalone discrete device, or dGPU (such as AMD’s Radeon, or Nvidia’s GeForce).

More information:

libCL 1.0 released

September 8th, 2011

libCL is an open-source parallel algorithm library written in C++ and OpenCL. Rather than a specific domain, libCL intends to encompass a wide range of parallel algorithms and data structures. The goal is to provide a comprehensive repository for high performance visual-centric computing ranging from fundamental primitives such as sorting, searching and algebra to advanced systems of algorithms for computational research and visualization. The current distribution of libCL already contains entirely parallelized implementations of the following algorithms:

  • Bounding volume hierarchy construction
  • Smoothed particle hydrodynamics
  • Radix sort
  • Adaptive tone-mapping
  • Screen-space ambient occlusion culling
  • Bilateral and Recursive Gaussian

libCL emerged out of OpenCL Studio, and as such integrates well with the development environment and its visualization capabilities. libCL is Open Source and released under the Apache license.

Non negative least squares on GPU/multicore architectures

September 4th, 2011


We parallelize a version of the active-set iterative algorithm derived from the original works of Lawson and Hanson (1974) on multi-core architectures. This algorithm requires the solution of an unconstrained least squares problem in every step of the iteration for a matrix composed of the passive columns of the original system matrix. To achieve improved performance, we use parallelizable procedures to efficiently update and {\em downdate} the QR factorization of the matrix at each iteration, to account for inserted and removed columns. We use a reordering strategy of the columns in the decomposition to reduce computation and memory access costs. We consider graphics processing units (GPUs) as a new mode for efficient parallel computations and compare our implementations to that of multi-core CPUs. Both synthetic and non-synthetic data are used in the experiments.

(Yuancheng Luo and Ramani Duraiswami, “Efficient Parallel Non-Negative Least Squares on Multicore Architectures”, SIAM Journal on Scientific Computing, accepted, Sep. 2011. [PDF] [Source code])

Page 30 of 110« First...1020...2829303132...405060...Last »