IWOCL 2015 OpenCL Developer Conference and Advanced OpenCL Tutorial

April 22nd, 2015

OpenCL LogoStanford, CA – 21 April 2015. The organisers of IWOCL (“eye-wok-ul”), the International Workshop on OpenCL, today announced that AMD and HP have sponsored the Advanced Hands-On OpenCL Tutorial that will kick-off IWOCL 2015. The tutorial, which will focus on advanced OpenCL concepts, is an extension of the highly successful ‘Hands on OpenCL’ course which has received over 3,000 downloads. Simon McIntosh-Smith, Senior Lecturer in High Performance Computing and Architectures at the University of Bristol and one of the authors of the original open-source course will lead the tutorial.

The full-day Advanced Hands-On OpenCL tutorial takes place on Monday 11th May at the Li Ka Shing Center, Stanford University. Registration is $145. For additional information visit: http://www.iwocl.org/conf-2015/handsonopencl-tutorial/ Read the rest of this entry »

GPU-Accelerated Inter-Cell Interference Coordination for LTE

April 21st, 2015


To minimize interference in LTE networks, several inter-cell interference coordination (ICIC) techniques have been introduced. Among them, semi-static ICIC offers a balanced trade-off between applicability and system performance. The power allocation per resource block and cell is adapted in the range of seconds according to the load in the system. An open issue in the literature is the question how fast the adaptation should be performed. This leads basically to a trade-off between system performance and feasible computation times of the associated power allocation problems. In this work, we close this open issue by studying the impact that different durations of update times of semi-static ICIC have on the system performance. We conduct our study on realistic scenarios considering also the mobility of mobile terminals. Secondly, we also consider the implementation aspects of a semi-static ICIC. We introduce a very efficient implementation on general purpose graphic processing units, harnessing the parallel computing capability of such devices. We show that the update periods have a significant impact on the performance of cell edge terminals. Additionally, we present a graphic processing unit (GPU) based implementation which speeds up existing implementations up to a factor of 92x.

Parruca, Donald and Aizaz, Fahad and Chantaraskul, Soamsiri and Gross, James. “Semi-static Interference Coordination in OFDMA/LTE Networks: Evaluation of Practical Aspects. In Proceedings of the 17th ACM International Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems, pp 87-94 2014.

Accelerate OpenFOAM® with Culises

April 13th, 2015

Culises significantly accelerates your OpenFOAM® application by using GPUs for the computationally most intensive tasks.

Its main features are

  • Library for GPU-based acceleration of OpenFOAM®
  • Multi-GPU support, significantly reduced computing times
  • Highly efficient state-of-the-art iterative solvers like AMG
  • Quick and easy installation, no validation necessary
  • Flexible interfaces to customer-specific software/engineering applications available

The acceleration of the linear solver by Culises is greater than 2x. The overall speedup depends on the type of application and the time spent in the linear solver. Culises my be tested on FluiDyna’s purpose-built workstation to determine the acceleration potential for your individual OpenFOAM® application. Find out more on: www.culises.com

RapidCFD: open-source CFD for GPUs

April 13th, 2015

A new open-source CFD project have just been published. RapidCFD is a new open-source CFD project that uses NVIDIA CUDA for the entire calculation process which gives a significant reduction in computation time.


  • most incompressible and compressible solvers on static mesh are available
  • all the calculations are done on the GPU
  • no overhead for GPU-CPU memory copy
  • can run in parallel on multiple GPUs

Visit RapidCFD project page.

CfP: Journal of Real-Time Image Processing (JRTIP), Special Issue on Heterogeneous Real-Time Image Processing

April 13th, 2015

Mobile devices, such as phones and tablets, offer a plethora of media-rich applications such as photo and video recording and editing, natural user interfaces and computer vision. Other areas of embedded image systems are characterized by close-to-sensor processing, such as advanced driver assistance systems, mobile scanners, and smart devices used in medical and industrial imaging. This demands highest computing capabilities at stringent resource and power budgets as well as hard real-time constraints.

Future scaling of computing performance mandates dramatically improving energy efficiency of image systems. One recognized trend is to use heterogeneous hardware such as big.LITTLE cores and accelerators such as DSPs, embedded GPUs, FPGAs, or dedicated hardware. Another trend is to use new 3D integrated circuit technologies that allow for tighter integration of compute cores, memory and sensors to reduce communication latency and improve bandwidth, leading to lower energy consumption.

This calls for novel methodologies for designing heterogeneous hardware, as well shielding software developers from growing complexity and allowing them to concentrate on algorithm development rather than on low level implementation details. Read the rest of this entry »

Scalable Partitioning for Parallel Position Based Dynamics

April 13th, 2015


We introduce a practical partitioning technique designed for parallelizing Position Based Dynamics, and exploiting the ubiquitous multi-core processors present in current commodity GPUs. The input is a set of particles whose dynamics is influenced by spatial constraints. In the initialization phase, we build a graph in which each node corresponds to a constraint and two constraints are connected by an edge if they influence at least one common particle. We introduce a novel greedy algorithm for inserting additional constraints (phantoms) in the graph such that the resulting topology is qˆ-colourable, where qˆ ≥ 2 is an arbitrary number. We color the graph, and the constraints with the same color are assigned to the same partition. Then, the set of constraints belonging to each partition is solved in parallel during the animation phase. We demonstrate this by using our partitioning technique; the performance hit caused by the GPU kernel calls is significantly decreased, leaving unaffected the visual quality, robustness and speed of serial position based dynamics.

(Fratarcangeli M and Pellacini F, Scalable Partitioning for Parallel Position Based Dynamics, Computer Graphics Forum (Special Issue of Eurographics 2015 Conference). Vol. 34(2) 2015)

HUCAA2015 CFP: Workshop on Heterogeneous and Unconventional Cluster Architectures and Applications

April 13th, 2015

4th International Workshop on Heterogeneous and Unconventional Cluster Architectures and Applications (HUCAA 2015)

Sept. 8-11, 2015 – Chicago, IL, US
In conjunction with IEEE CLUSTER 2015
IEEE International Conference on Cluster Computing


The workshop on Heterogeneous and Unconventional Cluster Architectures
and Applications gears to gather recent work on heterogeneous and
unconventional cluster architectures and applications, which might
have an impact on future mainstream cluster architectures. This
includes any cluster architecture that is not based on the usual
commodity components and therefore makes use of some special hard- or
software elements, or that is used for special and unconventional
applications. Read the rest of this entry »

CFP: 8th Workshop on UnConventional High Performance Computing 2015

April 13th, 2015

Recent issues with the power consumption of conventional HPC hardware results in both new interest in accelerator hardware and in usage of mass-market hardware originally not designed for HPC. The most prominent examples are GPUs, but FPGAs, DSPs and embedded designs are also possible candidates to provide higher power efficiency, as they are used in energy-restriced environments, such as smartphones or tablets. The so-called “dark silicon” forecast, i.e. not all transistors may be active at the same time, may lead to even more specialized hardware in future mass-market products. Exploiting this hardware for HPC can be a worthwhile challenge.

UCHPC is held in conjunction with Euro-Par 2015, August 24/25 in Vienna, Austria. More information,including the full call for papers, submission instructions and important dates: uchpc15.lrr.in.tum.de

OpenCL Training Course in Calgary, AB – May 26, 2015

April 13th, 2015

Acceleware’s next OpenCL course takes place in Calgary. This professional four day course is designed for programmers who are looking to develop comprehensive skills in writing and optimizing applications that fully leverage data parallel processing capabilities of GPUs. Register before May 12 if you would like to reserve a spot. To find out what the course includes visit:
Learn OpenCL in Calgary      www.acceleware.com

CfP: 3rd Workshop on Parallel and Distributed Agent-Based Simulations (PADABS 2015)

April 13th, 2015

The 3rd Workshop on Parallel and Distributed Agent-Based Simulations (PADABS) is a  satellite Workshop of Euro-Par 2015(Vienna, Austria, 24-28 August 2015).

Agent-Based Simulation Models are an increasingly popular tool for research and management in many fields such as ecology, economics, sociology, etc.. In some fields, such as social sciences, these models are seen as a key instrument to the generative approach, essential for understanding complex social phenomena. But also in policy-making, biology, military simulations, control of mobile robots and economics, the relevance and effectiveness of Agent-Based Simulation Models is recently recognized. Read the rest of this entry »

Page 2 of 11212345...102030...Last »