GPU4Vision Project

October 16th, 2008

GPU4Vision is a project founded by the Institute for Computer Graphics and Vision, Graz University of Technology dealing with fast computer vision algorithms for tasks like basic image processing, segmentation, motion, stereo etc. On the GPU4Vision website you can take a look at the project’s latest scientific publications, watch demo videos of algorithms and even download and evaluate some of them on your own PC. (GPU4Vision – Website)

Larrabee: A Many-Core x86 Architecture for Visual Computing

August 12th, 2008

Abstract:

This paper presents a many-core visual computing architecture code named Larrabee, a new software rendering pipeline, a manycore programming model, and performance analysis for several applications. Larrabee uses multiple in-order x86 CPU cores that are augmented by a wide vector processor unit, as well as some fixed function logic blocks. This provides dramatically higher performance per watt and per unit of area than out-of-order CPUs on highly parallel workloads. It also greatly increases the flexibility and programmability of the architecture as compared to standard GPUs. A coherent on-die 2nd level cache allows efficient inter-processor communication and high-bandwidth local data access by CPU cores. Task scheduling is performed entirely with software in Larrabee, rather than in fixed function logic. The customizable software graphics rendering pipeline for this architecture uses binning in order to reduce required memory bandwidth, minimize lock contention, and increase opportunities for parallelism relative to standard GPUs. The Larrabee native programming model supports a variety of highly parallel applications that use irregular data structures. Performance analysis on those applications demonstrates Larrabee’s potential for a broad range of parallel computation
(Larrabee: A Many-Core x86 Architecture for Visual Computing. Seiler, L., Carmean, D., Sprangle, D., Forsyth, T., Abrash, M., Dubey, P., Junkins, S., Lake, A., Sugerman, J., Cavin, R., Espasa, R., Grochowski, E., Juan, T., Hanrahan, P. Proceedings of SIGGRAPH 2008.)

Posted:

Case studies on GPU usage and data structure design

August 11th, 2008

Abstract

Big improvements in the performance of graphics processing units (GPUs) turned them into a compelling platform for high performance computing. In this thesis, we discuss the usage of NVIDIA’s CUDA in two applications — Einstein@Home, a distributed computing software, and OpenSteer, a game-like application. Our work on Einstein@Home demonstrates that CUDA can be integrated into existing applications with minimal changes, even in programs designed without considering GPU usage. However the existing data structure of Einstein@Home performs poorly when used on the GPU. We demonstrate that using a redesigned data structure improves the performance to about three times as fast as the original CPU version, even though the code executed on the device is not optimized. We further discuss the design of a novel spatial data structure called “dynamic grid” that is optimized for CUDA usage. We measure its performance by integrating it into the Boids scenario of OpenSteer. Our new concept outperforms a uniform grid by a margin of up to 15%, even though the dynamic grid still provides optimization potential.

(Case studies on gpu usage and data structure design. J. Breitbart, Master’s thesis, Universität Kassel, 2008)

High performance computing for deformable image registration: towards a new paradigm in adaptive radiotherapy

August 11th, 2008

This paper described an implementation of fast deformable image registration using GPUs and CUDA in radiation therapy. Using lung and prostate volumetric imaging, the GPU implementation is 40-66 times faster than a single-threaded CPU implementation and 25-41 times faster than a multithreaded implementation. The paradigm of GPU-based near-real-time deformable image registration opens up a host of clinical applications for medical imaging. ( High performance computing for deformable image registration: Towards a new paradigm in adaptive radiotherapy. (Sanjiv S. Samant, Junyi Xia, Pınar Muyan-Özçelik, John D. Owens. Medical physics, 2008.)

Faogen 2.0: Ambient occlusion calculation on the GPU

August 4th, 2008

Faogen ia a Fast Ambient Occlusion Generator. It uses a GPU to accelerate computation of ambient occlusion and bent normals both as per-vertex data and in texture images. Faogen 2.0 provides updated ambient aperture and bent normal shaders customizable by editing two simple GLSL functions. Other features include improved precision on large scale models, adjustable background for AO texture images, lighting animation control and bugfixes. (Faogen)

Semi-uniform Adaptive Patch Tessellation

August 4th, 2008

This paper by Dyken, Reimers, and Seland of University of Oslo and SINTEF ICT presents an adaptive tessellation scheme for parametric patches producing consistent and watertight tessellations. The scheme uses only a few base tessellations and is particularly well suited for use with instancing. In addition, a novel GPGPU bucket sort approach based on HistoPyramid is presented. The paper gives implementational details and performance benchmarks. (Semi-uniform Adaptive Patch Tessellation. C. Dyken, M. Reimers, and J. Seland. Computer Graphics Forum, to appear.)

Real-time Visual Tracker by Stream Processing

July 15th, 2008

This work describes the implementation of a real-time visual tracker that targets the position and 3D pose of objects (specifically faces) in video sequences. The use of GPUs for the computation and efficient sparse-template-based particle filtering allows real-time processing even when tracking multiple faces simultaneously in high-resolution video frames. Using a GPU and the NVIDIA CUDA technology, performance improvements as large as ten times compared to a similar CPU-only tracker are achieved. (Real-time Visual Tracker by Stream Processing. Oscar Mateo Lozano, and Kazuhiro Otsuka. Journal of Signal Processing Systems.)

Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations (Part 2: Double Precision GPUs)

July 14th, 2008

Abstract:

In a previous publication, we have examined the fundamental difference between computational precision and result accuracy in the context of the iterative solution of linear systems as they typically arise in the Finite Element discretization of Partial Differential Equations (PDEs). In particular, we evaluated mixed- and emulated-precision schemes on commodity graphics processors (GPUs), which at that time only supported computations in single precision. With the advent of graphics cards that natively provide double precision, this report updates our previous results.

We demonstrate that with new co-processor hardware supporting native double precision, such as NVIDIA’s G200 and T10 architectures, the situation does not change qualitatively for PDEs, and the previously introduced mixed precision schemes are still preferable to double precision alone. But the schemes achieve significant quantitative performance improvements with the more powerful hardware. In particular, we demonstrate that a Multigrid scheme can accurately solve a common test problem in Finite Element settings with one million unknowns in less than 0.1 seconds, which is truely outstanding performance. We support these conclusions by exploring the algorithmic design space enlarged by the availability of double precision directly in the hardware.

(Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations (Part 2: Double Precision GPUs). Dominik Göddeke and Robert Strzodka. Technical Report, 2008.)

CUDA.NET

July 10th, 2008

CUDA.NET is an effort by GASS to provide access to NVIDIA CUDA functionality through .NET applications. The library currently provides .NET bindings for CUDA functions, allowing programmers to use existing .NET applications as hosts for CUDA enabled devices, this way exposing a strong co-processor that can be used with .NET. The current distribution contains a .NET library that can be used from any .NET application and language, along with examples in C# and Python showing how to use the library. The API is very straightforward and compatible with the NVIDIA CUDA API available for C applications with few modifications to ease development and align with .NET standards. See the CUDA.NET home page for more details.

NVIDIA appoints first CUDA center of excellence

July 4th, 2008

From the press release:

SANTA CLARA, CA & URBANA, IL JUNE 30, 2008 NVIDIA Corporation (Nasdaq: NVDA), the worldwide leader in visual computing technologies, and the University of Illinois at Urbana-Champaign (UIUC) today announced that UIUC has been named as the world’s first CUDA Center of Excellence. In addition to the appointment, NVIDIA has donated $500,000 to UIUC for the development of parallel computing facilities and the continuation of its research programs.

“The CUDA Center of Excellence program rewards schools that truly embrace the concept of parallel processing as the future of computing”, said Dr. David Kirk, chief scientist at NVIDIA. “Schools receiving this accreditation integrate the CUDA software environment into their curriculum to help their students harness the capabilities of these new parallel processing architectures. As one of the country’s leading schools in this field, I am personally delighted to appoint UIUC as our first CUDA Center of Excellence.”

The Theoretical and Computational Biophysics Group at UIUC was one of the first research groups to leverage the parallel architecture of the GPU to accelerate their research in the field of computational biophysics. They have successfully accelerated NAMD/VMD, a popular parallel molecular dynamics application that analyzes large biomolecular systems. It is hoped that this donation will aid this group, and others at the university, to further their work and speed them down the path to great discovery.

(Complete Press Release)

Page 52 of 85« First...102030...5051525354...607080...Last »