OpenNL (Open Numerical Library) is a library for solving sparse linear systems, especially designed for the Computer Graphics community. The goal of OpenNL is to be as small as possible, while offering the subset of functionalities required by this application field. The Makefiles of OpenNL can generate a single .c and .h file that make it very easy to integrate into other projects. The distribution includes an implementation of a Least Squares Conformal Maps parameterization method. The new version 3.0 of OpenNL includes support for CUDA (with Concurrent Number Cruncher and CUSP ELL formats).
OpenNL 3.0: CUDA sparse linear solvers
February 14th, 2010CUDPP Users: Please Complete This Survey!
February 11th, 2010The developers of the CUDPP (CUDA Data-Parallel Primitives) Library request that users (past and current) of the CUDPP Library fill out the CUDPP Survey. This survey will help the CUDPP Team prioritize new development and support for existing and new features.
gDEBugger for OpenCL – Beta Program
February 10th, 2010Graphic Remedy is proud to announce the upcoming release of gDEBugger for OpenCL on Windows, Mac OS X and Linux. This new product will bring gDEBugger’s advanced Debugging, Profiling and Memory Analysis abilities to the OpenCL developer’s world, helping OpenCL developers find bugs and optimize parallel computing application performance and memory consumption.
To join the Free Beta Program, see screenshots and more details, please visit http://www.gremedy.com/gDEBuggerCL.php.
gDEBugger CL enables OpenCL developers to:
- Locate parallel computing performance bottlenecks
- Edit and continue OpenCL kernels “on the fly”
Read the rest of this entry »
Harnessing Graphics Processors for the Fast Computation of Acoustic Likelihoods in Speech Recognition
February 10th, 2010Abstract:
In large vocabulary continuous speech recognition (LVCSR) the acoustic model computations often account for the largest processing overhead. Our weighted finite state transducer (WFST) based decoding engine can utilize a commodity graphics processing unit (GPU) to perform the acoustic computations to move this burden off the main processor. In this paper we describe our new GPU scheme that can achieve a very substantial improvement in recognition speed whilst incurring no reduction in recognition accuracy. We evaluate the GPU technique on a large vocabulary spontaneous speech recognition task using a set of acoustic models with varying complexity and the results consistently show by using the GPU it is possible to reduce the recognition time with largest improvements occurring in systems with large numbers of Gaussians. For the systems which achieve the best accuracy we obtained between 2.5 and 3 times speed-ups. The faster decoding times translate to reductions in space, power and hardware costs by only requiring standard hardware that is already widely installed.
(Paul R. Dixon, Tasuku Oonishi, Sadaoki Furui, “Harnessing graphics processors for the fast computation of acoustic likelihoods in speech recognition”, Computer Speech & Language, Volume 23, Issue 4, October 2009, Pages 510-526, ISSN 0885-2308, DOI: 10.1016/j.csl.2009.03.005)
CfP: GPU Computing Gems
February 9th, 2010NVIDIA and Editor-in-Chief Professor Wen-mei Hwu of the University of Illinois, Urbana-Champaign invite you to submit articles for GPU Computing Gems, a contribution-based book that will focus on practical techniques for GPU computing. This is a continuation of the popular GPU Gems series.
The full Call for Participation is available here.
Programming Massively Parallel Processors: A Hands-on Approach
February 9th, 2010
The first textbook of its kind, Programming Massively Parallel Processors: A Hands-on Approach launches today, authored by Dr. David B. Kirk, NVIDIA Fellow and former chief scientist, and Dr. Wen-mei Hwu, who serves at the University of Illinois at Urbana-Champaign as Chair of Electrical and Computer Engineering in the Coordinated Science Laboratory, co-director of the Universal Parallel Computing Research Center and principal investigator of the CUDA Center of Excellence. The textbook, which is 256 pages, is the first aimed at teaching advanced students and professionals the basic concepts of parallel programming and GPU architectures. Published by Morgan-Kauffman, it explores various techniques for constructing parallel programs and reviews numerous case studies.
With conventional CPU-based computing no longer scaling in performance and the world’s computational challenges increasing in complexity, the need for massively parallel processing has never been greater. GPUs have hundreds of cores capable of delivering transformative performance increases across a wide range of computational challenges. The rise of these multi-core architectures has raised the need to teach advanced programmers a new and essential skill: how to program massively parallel processors.
Among the book’s key features:
- First and only text that teaches how to program within a massively parallel environment
- Portions of the NVIDIA-provided content have been part of the curriculum at 300 universities worldwide
- Drafts of sections of the book have been tested and taught by Kirk at the University of Illinois
- Book utilizes OpenCL and CUDA C, the NVIDIA parallel computing language developed specifically for massively parallel environments
Programming Massively Parallel Processors: A Hands-on Approach is available to purchase from Amazon or directly from Elsevier.
CfP: High performance computational systems Biology
February 8th, 2010The HiBi workshop establishes a forum to link researchers in the areas of parallel computing and computational systems biology. One of the main limitations in managing models of biological systems comes from the fundamental difference between the high parallelism evident in biochemical reactions and the sequential environments employed for the analysis of these reactions. Such limitations affect all varieties of continuous, deterministic, discrete and stochastic models; undermining the applicability of simulation techniques and analysis of biological models. The goal of HiBi is therefore to bring together researchers in the fields of high performance computing and computational systems biology. Experts from around the world will present their current work, discuss
profound challenges, new ideas, results, applications and their experience relating to key aspects of high performance computing in biology.
Topics of interest include, but are not limited to:
- Parallel stochastic simulation
- Biological and Numerical parallel computing
- Parallel and distributed architectures
- Emerging processing architecture: Cell processors, GPUs, mixed CPU-FPGA, etc.
- Parallel model checking techniques
- Parallel parameter estimation
- Parallel algorithms for biological analysis
- Application of concurrency theory to biology
- Parallel visualization algorithms
- Web-services and Internet computing for e-Science
- Tools and applications
More Information: http://www.cosbi.eu/hibi2010/
CfP: Symposium on chemical computations on GP-GPUs
February 8th, 2010The symposium will provide technical presentations from the companies advancing the development of GPUs, discussions of the challenges involved in effectively programming GPUs, and presentations on the use of GPUs in a range of chemical applications.
The deadline for submissions is 04/05/2010, and more information can be found at http://illinois.edu/lb/article/2101/33709.
CfP: High Performance Graphics 2010
February 7th, 2010High-Performance Graphics 2010 continues last year’s success at synthesizing two important and cutting-edge topics in computer graphics, the previous Graphics Hardware and Interactive Ray Tracing conferences. The scope of the conference is the overarching field of performance-oriented graphics systems, covering innovative algorithms, efficient implementations, and hardware architecture. This broader focus offers a common forum bringing together researchers, engineers, and architects to discuss the complex interactions of massively parallel hardware, novel programming models, efficient graphics algorithms, and innovative applications.
The program features three days of paper and industry presentations, with ample time for discussions during breaks, lunches, and the conference banquet. The conference, which will take place on June 25-27, is co-located with Eurographics Rendering Symposium on the campus of the Max-Planck Institut Informatik, Saarland University, Saarbrucken, Germany.
Original and innovative performance-oriented contributions are invited from all areas of graphics, including hardware architectures, rendering, physics, animation, AI, simulation, data structures, with topics including (but not limited to):
- New graphics hardware architectures
- Rendering architectures and algorithms
- Parallel computing for graphics (including GPU Computing)
- Algorithmic foundations
- Languages and compilation
The conference website with additional information is located at http://www.highperformancegraphics.org.
Triangular matrix inversion on Graphics Processing Unit
February 6th, 2010Abstract:
Dense matrix inversion is a basic procedure in many linear algebra algorithms. A computationally arduous step in most dense matrix inversion methods is the inversion of triangular matrices as produced by factorization methods such as LU decomposition. In this paper, we demonstrate how triangular matrix inversion (TMI) can be accelerated considerably by using commercial Graphics Processing Units (GPU) in a standard PC. Our implementation is based on a divide and conquer type recursive TMI algorithm, efficiently adapted to the GPU architecture. Our implementation obtains a speedup of 34x versus a CPU-based LAPACK reference routine, and runs at up to 54 gigaflops/s on a GTX 280 in double precision. Limitations of the algorithm are discussed, and strategies to cope with them are introduced. In addition, we show how inversion of an L- and U-matrix can be performed concurrently on a GTX 295 based dual-GPU system at up to 90 gigaflops/s.
(Florian Ries, Tommaso De Marco, Matteo Zivieri and Roberto Guerrieri, Triangular Matrix Inversion on Graphics Processing Units, Supercomputing 2009, DOI 10.1145/1654059.1654069)