February 28th, 2010
February 21st, 2010
We present an efficient method for the simulation of laminar fluid flows with free surfaces including their interaction with moving rigid bodies, based on the two-dimensional shallow water equations and the Lattice-Boltzmann method. Our implementation targets multiple fundamentally different architectures such as commodity multicore CPUs with SSE, GPUs, the Cell BE and clusters. We show that our code scales well on an MPI-based cluster; that an eightfold speedup can be achieved using modern GPUs in contrast to multithreaded CPU code and, finally, that it is possible to solve fluid-structure interaction scenarios with high resolution at interactive rates.
(Markus Geveler, Dirk Ribbrock, Dominik Göddeke and Stefan Turek: “Lattice-Boltzmann Simulation of the Shallow-Water Equations with Fluid-Structure Interaction on Multi- and Manycore Processors”, Accepted in: Facing the Multicore Challenge, Heidelberg, Germany, Mar. 2010. Link.)
February 14th, 2010
WaveTomography is a 2D time-domain waveform tomography reconstruction algorithm that can be run on graphics processing units. It features:
- Wave propagation using leapfrog and ONADM schemes.
- First order absorbing boundary conditions.
- CPU only and CPU/GPU implementations.
- Flexible reconstruction strategy (choice of emitters and receivers at each iteration).
- Flexible imaging setup (choice of transducers’ positions).
The WaveTomography package also includes a standalone simulator for wave propagation. The source code can be freely downloaded.
(Roy, O., Jovanovic, I., Hormati, A., and Parhizkar, R., and Vetterli, M., “Sound speed estimation using wave-based ultrasound tomography: Theory and GPU implementation”, in Proc. SPIE Medical Imaging, 2010.)
February 10th, 2010
OpenNL (Open Numerical Library) is a library for solving sparse linear systems, especially designed for the Computer Graphics community. The goal of OpenNL is to be as small as possible, while offering the subset of functionalities required by this application field. The Makefiles of OpenNL can generate a single .c and .h file that make it very easy to integrate into other projects. The distribution includes an implementation of a Least Squares Conformal Maps parameterization method. The new version 3.0 of OpenNL includes support for CUDA (with Concurrent Number Cruncher and CUSP ELL formats).
February 9th, 2010
In large vocabulary continuous speech recognition (LVCSR) the acoustic model computations often account for the largest processing overhead. Our weighted finite state transducer (WFST) based decoding engine can utilize a commodity graphics processing unit (GPU) to perform the acoustic computations to move this burden off the main processor. In this paper we describe our new GPU scheme that can achieve a very substantial improvement in recognition speed whilst incurring no reduction in recognition accuracy. We evaluate the GPU technique on a large vocabulary spontaneous speech recognition task using a set of acoustic models with varying complexity and the results consistently show by using the GPU it is possible to reduce the recognition time with largest improvements occurring in systems with large numbers of Gaussians. For the systems which achieve the best accuracy we obtained between 2.5 and 3 times speed-ups. The faster decoding times translate to reductions in space, power and hardware costs by only requiring standard hardware that is already widely installed.
(Paul R. Dixon, Tasuku Oonishi, Sadaoki Furui, “Harnessing graphics processors for the fast computation of acoustic likelihoods in speech recognition”, Computer Speech & Language, Volume 23, Issue 4, October 2009, Pages 510-526, ISSN 0885-2308, DOI: 10.1016/j.csl.2009.03.005)
February 8th, 2010
NVIDIA and Editor-in-Chief Professor Wen-mei Hwu of the University of Illinois, Urbana-Champaign invite you to submit articles for GPU Computing Gems, a contribution-based book that will focus on practical techniques for GPU computing. This is a continuation of the popular GPU Gems series.
The full Call for Participation is available here.
February 8th, 2010
The HiBi workshop establishes a forum to link researchers in the areas of parallel computing and computational systems biology. One of the main limitations in managing models of biological systems comes from the fundamental difference between the high parallelism evident in biochemical reactions and the sequential environments employed for the analysis of these reactions. Such limitations affect all varieties of continuous, deterministic, discrete and stochastic models; undermining the applicability of simulation techniques and analysis of biological models. The goal of HiBi is therefore to bring together researchers in the fields of high performance computing and computational systems biology. Experts from around the world will present their current work, discuss
profound challenges, new ideas, results, applications and their experience relating to key aspects of high performance computing in biology.
Topics of interest include, but are not limited to:
- Parallel stochastic simulation
- Biological and Numerical parallel computing
- Parallel and distributed architectures
- Emerging processing architecture: Cell processors, GPUs, mixed CPU-FPGA, etc.
- Parallel model checking techniques
- Parallel parameter estimation
- Parallel algorithms for biological analysis
- Application of concurrency theory to biology
- Parallel visualization algorithms
- Web-services and Internet computing for e-Science
- Tools and applications
More Information: http://www.cosbi.eu/hibi2010/
February 7th, 2010
The symposium will provide technical presentations from the companies advancing the development of GPUs, discussions of the challenges involved in effectively programming GPUs, and presentations on the use of GPUs in a range of chemical applications.
The deadline for submissions is 04/05/2010, and more information can be found at http://illinois.edu/lb/article/2101/33709.
February 6th, 2010
High-Performance Graphics 2010 continues last year’s success at synthesizing two important and cutting-edge topics in computer graphics, the previous Graphics Hardware and Interactive Ray Tracing conferences. The scope of the conference is the overarching field of performance-oriented graphics systems, covering innovative algorithms, efficient implementations, and hardware architecture. This broader focus offers a common forum bringing together researchers, engineers, and architects to discuss the complex interactions of massively parallel hardware, novel programming models, efficient graphics algorithms, and innovative applications.
The program features three days of paper and industry presentations, with ample time for discussions during breaks, lunches, and the conference banquet. The conference, which will take place on June 25-27, is co-located with Eurographics Rendering Symposium on the campus of the Max-Planck Institut Informatik, Saarland University, Saarbrucken, Germany.
Original and innovative performance-oriented contributions are invited from all areas of graphics, including hardware architectures, rendering, physics, animation, AI, simulation, data structures, with topics including (but not limited to):
- New graphics hardware architectures
- Rendering architectures and algorithms
- Parallel computing for graphics (including GPU Computing)
- Algorithmic foundations
- Languages and compilation
The conference website with additional information is located at http://www.highperformancegraphics.org.
February 2nd, 2010
Dense matrix inversion is a basic procedure in many linear algebra algorithms. A computationally arduous step in most dense matrix inversion methods is the inversion of triangular matrices as produced by factorization methods such as LU decomposition. In this paper, we demonstrate how triangular matrix inversion (TMI) can be accelerated considerably by using commercial Graphics Processing Units (GPU) in a standard PC. Our implementation is based on a divide and conquer type recursive TMI algorithm, efficiently adapted to the GPU architecture. Our implementation obtains a speedup of 34x versus a CPU-based LAPACK reference routine, and runs at up to 54 gigaflops/s on a GTX 280 in double precision. Limitations of the algorithm are discussed, and strategies to cope with them are introduced. In addition, we show how inversion of an L- and U-matrix can be performed concurrently on a GTX 295 based dual-GPU system at up to 90 gigaflops/s.
(Florian Ries, Tommaso De Marco, Matteo Zivieri and Roberto Guerrieri, Triangular Matrix Inversion on Graphics Processing Units, Supercomputing 2009, DOI 10.1145/1654059.1654069)
We present HONEI, an open-source collection of libraries offering a hardware oriented approach to numerical calculations. HONEI abstracts the hardware, and applications written on top of HONEI can be executed on a wide range of computer architectures such as CPUs, GPUs and the Cell processor. We demonstrate the flexibility and performance of our approach with two test applications, a Finite Element multigrid solver for the Poisson problem and a robust and fast simulation of shallow water waves. By linking against HONEI’s libraries, we achieve a two-fold speedup over straight forward C++ code using HONEI’s SSE backend, and additional 3–4 and 4–16 times faster execution on the Cell and a GPU. A second important aspect of our approach is that the full performance capabilities of the hardware under consideration can be exploited by adding optimised application-specific operations to the HONEI libraries. HONEI provides all necessary infrastructure for development and evaluation of such kernels, significantly simplifying their development.
(Danny van Dyk, Markus Geveler, Sven Mallach, Dirk Ribbrock, Dominik Göddeke and Carsten Gutwenger: HONEI: A collection of libraries for numerical computations targeting multiple processor architectures. Computer Physics Communications 180(12), pp. 2534-2543, December 2009. DOI 10.1016/j.cpc.2009.04.018)