ofgpu v0.2 released: GPU linear solvers for OpenFOAM

September 24th, 2011

The latest release of Symscape’s ofgpu (v0.2) for OpenFOAM® 2.0.x is now available. ofgpu is an open source experimental linear solver library that targets NVIDIA CUDA GPU devices on Windows, Linux, and (untested) Mac OS X. ofgpu now has support for the Cusp preconditioners:

  • smoothed_aggregation – equivalent to Algebraic Multi-Grid (AMG)
  • scaled_bridson_ainv
  • bridson_ainv
  • nonsym_bridson_ainv

Also supported is the option to select the GPU device. For more details see http://www.symscape.com/gpu-0-2-openfoam.

Aparapi – Parallel programming with Java and OpenCL

September 15th, 2011

AMD just released to open source a project called Aparapi that started in their JavaLabs team. Aparapi is an API for expressing data parallel workloads in Java and a runtime component capable of converting the Java bytecode of compatible workloads into OpenCL™ so that it can be executed on a variety of GPU devices.  More information can be found in this blog entry.

Thrust: A Productivity-Oriented Library for CUDA

September 12th, 2011

Abstract:

This chapter demonstrates how to leverage the Thrust parallel template library to implement high-performance applications with minimal programming effort. Based on the C++ Standard Template Library (STL), Thrust brings a familiar high-level interface to the realm of GPU Computing while remaining fully interoperable with the rest of the CUDA software ecosystem. Applications written with Thrust are concise, readable, and efficient.

(Nathan Bell and Jared Hoberock: “Thrust: A Productivity-Oriented Library for CUDA”, GPU Computing Gems, Jade Edition, edited by Wen-mei W. Hwu, October 2011)

An Analysis of the GPU Market

September 10th, 2011

From the abstract of a GPU market analysis whitepaper by John Peddie Research:

Computer graphics is hard work. Behind the images you see in games and movies, or while editing photos or video, some serious processing is taking place. All the processing power you can muster is needed to push and polish pixels. And this task is only going to get more demanding as these applications get more sophisticated. Graphics Processing Units (GPUs), which do the heavy lifting in computer graphics, range greatly in size, price and performance. They span from tiny cores inside an ARM processor (such as Nvidia’s Tegra or Qualcomm’s Snapdragon), to graphics integrated within an X86 processor (such as AMD’s Fusion, Intel’s Sandy Bridge), to a standalone discrete device, or dGPU (such as AMD’s Radeon, or Nvidia’s GeForce).

More information: http://jonpeddie.com/media/presentations/an-analysis-of-the-gpu-market/

libCL 1.0 released

September 8th, 2011

libCL is an open-source parallel algorithm library written in C++ and OpenCL. Rather than a specific domain, libCL intends to encompass a wide range of parallel algorithms and data structures. The goal is to provide a comprehensive repository for high performance visual-centric computing ranging from fundamental primitives such as sorting, searching and algebra to advanced systems of algorithms for computational research and visualization. The current distribution of libCL already contains entirely parallelized implementations of the following algorithms:

  • Bounding volume hierarchy construction
  • Smoothed particle hydrodynamics
  • Radix sort
  • Adaptive tone-mapping
  • Screen-space ambient occlusion culling
  • Bilateral and Recursive Gaussian

libCL emerged out of OpenCL Studio, and as such integrates well with the development environment and its visualization capabilities. libCL is Open Source and released under the Apache license.

Non negative least squares on GPU/multicore architectures

September 4th, 2011

Abstract:

We parallelize a version of the active-set iterative algorithm derived from the original works of Lawson and Hanson (1974) on multi-core architectures. This algorithm requires the solution of an unconstrained least squares problem in every step of the iteration for a matrix composed of the passive columns of the original system matrix. To achieve improved performance, we use parallelizable procedures to efficiently update and {\em downdate} the QR factorization of the matrix at each iteration, to account for inserted and removed columns. We use a reordering strategy of the columns in the decomposition to reduce computation and memory access costs. We consider graphics processing units (GPUs) as a new mode for efficient parallel computations and compare our implementations to that of multi-core CPUs. Both synthetic and non-synthetic data are used in the experiments.

(Yuancheng Luo and Ramani Duraiswami, “Efficient Parallel Non-Negative Least Squares on Multicore Architectures”, SIAM Journal on Scientific Computing, accepted, Sep. 2011. [PDF] [Source code])

GTC Worldwide Call for Speakers & Posters

September 3rd, 2011

NVIDIA is looking for research posters and speakers for their upcoming events including GTC Express @ SC’11, GTC Asia and GTC U.S. More information about the events, submission procedures and the speaking opportunities can be found here, and the submission system is available at this page.

GPU Implementation of a Helmholtz Krylov Solver Preconditioned by a Shifted Laplace Multigrid Method

September 2nd, 2011

Abstract:

A Helmholtz equation in two dimensions discretized by a second order finite difference scheme is considered. Krylov methods such as Bi-CGSTAB and IDR(s) have been chosen as solvers. Since the convergence of the Krylov solvers deteriorates with increasing wave number, a shifted Laplace multigrid preconditioner is used to improve the convergence. The implementation of the preconditioned solver on CPU (Central Processing Unit) is compared to an implementation on GPU (Graphics Processing Units or graphics card) using CUDA (Compute Unified Device Architecture). The results show that preconditioned Bi-CGSTAB on GPU as well as preconditioned IDR(s) on GPU is about 30 times faster than on CPU for the same stopping criterion.

(H. Knibbe, C.W. Oosterlee and C. Vuik, “GPU implementation of a Helmholtz Krylov solver preconditioned by a shifted Laplace multigrid method”, accepted for publication in the Journal of Computational and Applied Mathematics, 2011. [DOI])

Fast Hough Transform on GPUs: Exploration of Algorithm Trade-offs

August 29th, 2011

Abstract:

The Hough transform is a commonly used algorithm to detect lines and other features in images. It is robust to noise and occlusion, but has a large computational cost. This paper introduces two new implementations of the Hough transform for lines on a GPU. One focuses on minimizing processing time, while the other has an input-data independent processing time. Our results show that optimizing the GPU code for speed can achieve a speed-up over naive GPU code of about 10x. The implementation which focuses on processing speed is the faster one for most images, but the implementation which achieves a constant processing time is quicker for about 20% of the images.

(Gert-Jan van den Braak, Cedric Nugteren, Bart Mesman and Henk Corporaal: “Fast Hough Transform on GPUs: Exploration of Algorithm Trade-offs”. In: Advanced Concepts for Intelligent Vision Systems, Lecture Notes in Computer Science, Vol. 6915, pp.611-622, 2011. [DOI])

HPG11 papers available

August 27th, 2011

All papers and presentations from High Performance Graphics 2011 are now available online, including the keynote presentations and the Hot3D track.

HPG11 was held in Vancouver earlier this month.

Page 9 of 88« First...7891011...203040...Last »