Performance Evaluation of GPUs Using the RapidMind Development Platform

November 4th, 2006

This white paper from RapidMind and HP compares the performance of BLAS dense linear algebra operations, the FFT, and European option pricing on the GPU against highly tuned CPU implementations on the fastest available CPUs. All of the GPU implementations were made using the RapidMind Development Platform, which allows the use of standard C++ programming to create high-performance parallel applications that run on the GPU. The full source for the samples is available in conjunction with a new beta version of the RapidMind development platform. The results will also be presented as a poster at SC06.
(http://rapidmind.net/sc06_hp_rapidmind_cpugpu_summary.php)

PeakStream launches software platform to harness the power of next-generation multi-core processors

October 30th, 2006

PeakStream, Inc., a leading software application platform provider for the high performance computing (HPC) market, today unveiled the PeakStream Platform. Available immediately, the PeakStream Platform makes it possible to easily program new high performance processors such as multi-core CPUs, graphics processing units (GPUs) and Cell processors, converting them into radically powerful computing engines for exponentially increased application performance and decreased time-to-solution at reduced cost. The company also announced the completion of equity financing totaling $17 million from Kleiner Perkins Caufield & Byers, Sequoia Capital and Foundation Capital. (www.peakstreaminc.com)

Robust and Efficient Photo Consistency Estimation for Volumetric 3D Reconstruction

October 24th, 2006

The computational power of GPU-based algorithms is receiving increased attention in research on Computer Vision and 3D stereo reconstruction from images. In this context one of the most important ingredients for any 3D stereo reconstruction technique is the estimation of photo consistency. This ECCV06 paper presents a new, illumination invariant photo consistency measure for high quality, volumetric 3D reconstruction from calibrated images. In contrast to current standard methods such as normalized cross-correlation it supports unconstrained camera setups and non-planar surface approximations. The paper shows how this measure as well as the other important stages of the volumetric reconstruction pipeline can be implemented in a highly efficient way by exploiting current graphics processors. The authors’ GPU implementation achieves speedups up to a factor of 85 in comparison to CPU-based algorithms, and allows reconstruction of high quality models with computation times of only a few seconds to minutes, even for large numbers of cameras and high volumetric resolutions. (Robust and Efficient Photo-Consistency Estimation for Volumetric 3D Reconstruction. Alexander Hornung and Leif Kobbelt. European Conference on Computer Vision (ECCV 2006), LNCS, vol. 3952, Springer, 179-190.)

A Memory Model for Scientic Algorithms on Graphics Processors

October 4th, 2006

This Supercomputing 2006 paper by Govindaraju et al. presents a memory model to analyze and improve the performance of scientific algorithms on graphics processing units (GPUs). The memory model is based on texturing hardware, which uses a 2D block-based array representation to perform the underlying computations. It incorporates many characteristics of GPU architectures including smaller cache sizes, 2D block representations, and uses the 3C’s model to analyze the cache misses. Moreover, the paper presents techniques to improve the performance of nested loops on GPUs. In order to demonstrate the effectiveness of the model, the paper highlights its performance on three memory-intensive scientific applications: sorting, Fast Fourier Transform and dense matrix multiplication. In practice, their cache-efficient algorithms for these applications are able to achieve memory throughput of 30-50 GB/s on an NVIDIA 7900 GTX GPU. The paper also compares its results with prior GPU-based and CPU-based implementations on high-end processors. In practice, they are able to achieve 2x-5x performance improvement. (A Memory Model for Scientic Algorithms on Graphics Processors)

Free gDEBugger License for Academic Users

October 4th, 2006

The OpenGL ARB and Graphic Remedy have crafted an Academic Program to make the full featured gDEBugger OpenGL debug toolkit available for use in your daily work and research – free of charge! gDEBugger is a powerful OpenGL and OpenGL ES debugger and profiler delivering one of the most intuitive OpenGL development toolkits available for graphics application developers. The ARB.Graphic Remedy Academic Program will run for one year during which time any OpenGL developer who is able to confirm they are in academia will receive an Academic gDEBugger License from Graphic Remedy at no cost. This license will be valid for one year and will include all gDEBugger software updates as they become available. Academic licensees may also optionally decide to purchase an annual support contract for the software at a reduced rate. For further information, visit:
http://academic.gremedy.com and
http://www.opengl.org/pipeline/article/vol001_3/”
.

Supercomputing ’06 Workshop: "General-Purpose GPU Computing: Practice And Experience"

August 21st, 2006

SC’06 is proud to announce the “General-Purpose GPU Computing: Practice and Experience” workshop. This workshop features invited speakers and poster presenters who provide insights into current GPGPU practice and experience, and chart future directions in heterogeneous and homogeneous multi-core processor architectures and data-parallel processor architectures such as GPUs. The topics addressed by the speakers range from current GPGPU practice and experience to future issues and research areas in parallel computing currently being driven by GPGPU innovations and lessons learned, such as the IBM Cell Broadband Engine and Sun Microsystem’s Niagara/Sun4v processor. Poster presentations are solicited in, but not strictly limited to, the following
areas:

  • Application acceleration
  • GPGPU/multi-core/parallel coprocessor integration: toolkits, implementation techniques (e.g., iterative refinement, numerical techniques), domain-specific languages
  • GPGPU implementation issues: performance issues and challenges, cooperative GPU/CPU algorithms and solutions, numerical analysis issues, HPC issues
  • Cluster-based GPGPU computing and Grid integration

Please submit prospective poster abstracts in PDF or PostScript format to the workshop chair for consideration and review. Poster abstract submission deadline: No later than October 1st. (For more information see http://www.gpgpu.org/sc2006/workshop/)

EM Photonics releases free GPU-Based FDTD Accelerator

August 18th, 2006

EM Photonics, Inc., a leading provider of accelerated hardware technologies, released FastFDTD, a free 2D and 3D accelerated FDTD solver based on GPU technology. The FastFDTD toolkit contains all files and documentation necessary to accelerate FDTD computations using a simple input file format. The 2D and 3D solvers include a variety of sources and materials, and more are being added. When asked why EM Photonics was providing this toolkit for free, Eric Kelmelis, Vice President, said

We decided to release our GPU-based FDTD accelerator free of charge to demonstrate the power of application acceleration with alternative computational platforms. This solver shows a single graphics card running 20-30 times faster than an optimized software implementation. Our focus will remain on pushing the boundaries of this technology and accelerating other applications with commodity hardware devices such as graphics cards and FPGAs.

For more information, including specific feature sets, compatible graphics cards, and detailed license information, please visit the FastFDTD webpage at http://www.emphotonics.com/fastfdtd.html

GPU-ABiSort: Optimal Parallel Sorting on Stream Architectures

August 14th, 2006

This paper presents a novel approach for parallel sorting on stream processing architectures. It is based on adaptive bitonic sorting. For sorting n values utilizing p stream processor units, this approach achieves the optimal time complexity O((n log n)/p). This approach is competitive with common sequential sorting algorithms not only from a theoretical viewpoint, it is also very fast from a practical viewpoint. The paper presents an implementation on modern programmable graphics hardware (GPUs). On recent GPUs this optimal parallel sorting approach has shown to be remarkably faster than sequential sorting on the CPU, and it is also faster than previous non-optimal sorting approaches on the GPU for sufficiently large input sequences. (GPU-ABiSort: Optimal Parallel Sorting on Stream Architectures Alexander Gress and Gabriel Zachmann. Proc. 20th IEEE Int’l Parallel and Distributed Processing Symposium (IPDPS), 2006.)

A New Low-Level Interface for GPGPU Applications on ATI GPUs

August 10th, 2006

At SIGGRAPH in Boston, Derek Gerstmann of ATI presented a sketch titled, “A Performance-Oriented Data Parallel Virtual Machine for GPGPU Applications.” The system exposes GPU functionality at a low-level (including the fragment processors’ native instruction set), giving the programmer direct control over program compilation and loading, GPU memory management, and GPU/CPU synchronization. A write-up is available at www.ati.com/developer. If you are interested in obtaining the system for evaluation, please contact researcher@ati.com.

SIGGRAPH Poster: Extended-Precision Floating-Point Numbers for GPU Computation

August 10th, 2006

Using unevaluated sums of paired or quadrupled single-precision (f32) values, double-float (df64) and quad-float (qf128) numeric types can be implemented on current GPUs and used efficiently and effectively for extended-precision computation for real and complex arithmetic. These numeric types provide 48 and 96 bits of precision respectively at f32 exponent ranges for computer graphics and general purpose (GPGPU) programming. Double- and quad-floats may be useful not only for extending available precision but also for accurate computation by only partially IEEE compliant single-precision floats. The poster and demos presented at ACM SIGGRAPH 06 discussed the implementation and application of these numbers in the Cg language for real and complex GPU programming. The df64 library includes math routines for exponential, log, and trigonometric functions. The poster can be downloaded from Andrew Thall’s website.  Technical details will be available shortly, and the code itself will be made available for distribution given sufficient interest.

Page 85 of 108« First...102030...8384858687...90100...Last »