PASCO 2010: Call for Papers

March 9th, 2010

The International Workshop on Parallel and Symbolic Computation (PASCO) is a series of workshops dedicated to the promotion and advancement of parallel algorithms and software in all areas of symbolic mathematical computation. The pervasive ubiquity of parallel architectures and memory hierarchy has led to the emergence of a new quest for parallel mathematical algorithms and software capable of exploiting the various levels of parallelism: from hardware acceleration technologies (multi-core and multi-processor system on chip, GPGPU, FPGA) to cluster and global computing platforms. To push up the limits of symbolic and algebraic computations, beyond the optimization of the application itself, the effective use of a large number of resources -memory and general or specialized computing units- is expected to enhance the performance multi-criteria objectives: time, energy consumption, resource usage, reliability. In this context, the design and the implementation of mathematical algorithms with provable and adaptive performances is a major challenge.

The workshop PASCO 2010 will be a three-day event including invited presentations and tutorials, contributed research papers and posters, and a programming contest. Specific topics include, but are not limited to: Read the rest of this entry »

Cyclic Reduction Tridiagonal Solvers on GPUs Applied to Mixed Precision Multigrid

March 3rd, 2010


We have previously suggested mixed precision iterative solvers specifically tailored to the iterative solution of sparse linear equation systems as they typically arise in the finite element discretization of partial differential equations. These schemes have been evaluated for a number of hardware platforms, in particular single precision GPUs as accelerators to the general purpose CPU. This paper reevaluates the situation with new mixed precision solvers that run entirely on the GPU: We demonstrate that mixed precision schemes constitute a significant performance gain over native double precision. Moreover, we present a new implementation of cyclic reduction for the parallel solution of tridiagonal systems and employ this scheme as a line relaxation smoother in our GPU-based multigrid solver. With an alternating direction implicit variant of this advanced smoother we can extend the applicability of the GPU multigrid solvers to very ill-conditioned systems arising from the discretization on anisotropic meshes, that previously had to be solved on the CPU. The resulting mixed precision schemes are always faster than double precision alone, and outperform tuned CPU solvers consistently by almost an order of magnitude.

(Dominik Göddeke and Robert Strzodka: “Cyclic Reduction Tridiagonal Solvers on GPUs Applied to Mixed Precision Multigrid” , accepted in: IEEE Transactions on Parallel and Distributed Systems, Special Issue: High Performance Computing with Accelerators, Mar. 2010. Link.)

Easy GPU programming with GMAC

March 1st, 2010

GMAC (Global Memory for ACcelerators) is a user-level library that implements an Asymmetric Distributed Shared Memory model to be used by CUDA programs. An ADSM model allows CPU code to access data hosted in accelerator (GPU) memory. In this model, a single pointer is used for data structures accessed both in the CPU and the GPU and the coherency of the data is transparently handled by the library. Moreover, the data allocated with GMAC can be accessed by all the host threads of the program. That makes your code simpler and cleaner. GMAC currently supports programs programmed with CUDA, but OpenCL support is planned.

A paper describing the Asymmetric Distributed Shared Memory model and its implementation in GMAC has been accepted in the ASPLOS XV conference. GMAC is being developed by the Operating System Group at the Universitat Politecnica de Catalunya and the IMPACT Research Group at the University of Illinois. Binary pre-compiled packages, the source code, documentation and examples are available at the project website.

(Isaac Gelado, Javier Cabezas, John Stone, Sanjay Patel, Nacho Navarro and Wen-mei Hwu,  “An Asymmetric Distributed Shared Memory Model for Heterogeneous Parallel Systems”, accepted in: Fifteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2010), March 2010.)

Lattice-Boltzmann Simulation of the Shallow-Water Equations with Fluid-Structure Interaction on Multi- and Manycore Processors

February 28th, 2010


We present an efficient method for the simulation of laminar fluid flows with free surfaces including their interaction with moving rigid bodies, based on the two-dimensional shallow water equations and the Lattice-Boltzmann method. Our implementation targets multiple fundamentally different architectures such as commodity multicore CPUs with SSE, GPUs, the Cell BE and clusters. We show that our code scales well on an MPI-based cluster; that an eightfold speedup can be achieved using modern GPUs in contrast to multithreaded CPU code and, finally, that it is possible to solve fluid-structure interaction scenarios with high resolution at interactive rates.

(Markus Geveler, Dirk Ribbrock, Dominik Göddeke and Stefan Turek: “Lattice-Boltzmann Simulation of the Shallow-Water Equations with Fluid-Structure Interaction on Multi- and Manycore Processors”, Accepted in: Facing the Multicore Challenge, Heidelberg, Germany, Mar. 2010. Link.)

WaveTomography v1.0: 2D waveform tomography reconstruction

February 21st, 2010

WaveTomography is a 2D time-domain waveform tomography reconstruction algorithm that can be run on graphics processing units. It features:

  • Wave propagation using leapfrog and ONADM schemes.
  • First order absorbing boundary conditions.
  • CPU only and CPU/GPU implementations.
  • Flexible reconstruction strategy (choice of emitters and receivers at each iteration).
  • Flexible imaging setup (choice of transducers’ positions).

The WaveTomography package also includes a standalone simulator for wave propagation. The source code can be freely downloaded.

(Roy, O., Jovanovic, I., Hormati, A., and Parhizkar, R., and Vetterli, M., “Sound speed estimation using wave-based ultrasound tomography: Theory and GPU implementation”, in Proc. SPIE Medical Imaging, 2010.)

OpenNL 3.0: CUDA sparse linear solvers

February 14th, 2010

OpenNL (Open Numerical Library) is a library for solving sparse linear systems, especially designed for the Computer Graphics community. The goal of OpenNL is to be as small as possible, while offering the subset of functionalities required by this application field. The Makefiles of OpenNL can generate a single .c and .h file that make it very easy to integrate into other projects. The distribution includes an implementation of a Least Squares Conformal Maps parameterization method.  The new version 3.0 of OpenNL includes support for CUDA (with Concurrent Number Cruncher and CUSP ELL formats).

Harnessing Graphics Processors for the Fast Computation of Acoustic Likelihoods in Speech Recognition

February 10th, 2010


In large vocabulary continuous speech recognition (LVCSR) the acoustic model computations often account for the largest processing overhead. Our weighted finite state transducer (WFST) based decoding engine can utilize a commodity graphics processing unit (GPU) to perform the acoustic computations to move this burden off the main processor. In this paper we describe our new GPU scheme that can achieve a very substantial improvement in recognition speed whilst incurring no reduction in recognition accuracy. We evaluate the GPU technique on a large vocabulary spontaneous speech recognition task using a set of acoustic models with varying complexity and the results consistently show by using the GPU it is possible to reduce the recognition time with largest improvements occurring in systems with large numbers of Gaussians. For the systems which achieve the best accuracy we obtained between 2.5 and 3 times speed-ups. The faster decoding times translate to reductions in space, power and hardware costs by only requiring standard hardware that is already widely installed.

(Paul R. Dixon, Tasuku Oonishi, Sadaoki Furui, “Harnessing graphics processors for the fast computation of acoustic likelihoods in speech recognition”, Computer Speech & Language, Volume 23, Issue 4, October 2009, Pages 510-526, ISSN 0885-2308, DOI: 10.1016/j.csl.2009.03.005)

CfP: GPU Computing Gems

February 9th, 2010

NVIDIA and Editor-in-Chief  Professor Wen-mei Hwu of the University of Illinois, Urbana-Champaign invite you to submit articles for GPU Computing Gems, a contribution-based book that will focus on practical techniques for GPU computing. This is a continuation of the popular GPU Gems series.

The full Call for Participation is available here.

CfP: High performance computational systems Biology

February 8th, 2010

The HiBi workshop establishes a forum to link researchers in the areas of parallel computing and computational systems biology. One of the main limitations in managing models of biological systems comes from the fundamental difference between the high parallelism evident in biochemical reactions and the sequential environments employed for the analysis of these reactions. Such limitations affect all varieties of continuous, deterministic, discrete and stochastic models; undermining the applicability of simulation techniques and analysis of biological models. The goal of HiBi is therefore to bring together researchers in the fields of high performance computing and computational systems biology. Experts from around the world will present their current work, discuss
profound challenges, new ideas, results, applications and their experience relating to key aspects of high performance computing in biology.

Topics of interest include, but are not limited to:

  • Parallel stochastic simulation
  • Biological and Numerical parallel computing
  • Parallel and distributed architectures
  • Emerging processing architecture: Cell processors, GPUs, mixed CPU-FPGA, etc.
  • Parallel model checking techniques
  • Parallel parameter estimation
  • Parallel algorithms for biological analysis
  • Application of concurrency theory to biology
  • Parallel visualization algorithms
  • Web-services and Internet computing for e-Science
  • Tools and applications

More Information:

CfP: Symposium on chemical computations on GP-GPUs

February 8th, 2010

The symposium will provide technical presentations from the companies advancing the development of GPUs, discussions of the challenges involved in effectively programming GPUs, and presentations on the use of GPUs in a range of chemical applications.

The deadline for submissions is 04/05/2010, and more information can be found at