Faogen ia a Fast Ambient Occlusion Generator. It uses a GPU to accelerate computation of ambient occlusion and bent normals both as per-vertex data and in texture images. Faogen 2.0 provides updated ambient aperture and bent normal shaders customizable by editing two simple GLSL functions. Other features include improved precision on large scale models, adjustable background for AO texture images, lighting animation control and bugfixes. (Faogen)
This paper by Dyken, Reimers, and Seland of University of Oslo and SINTEFÂ ICT presents an adaptive tessellation scheme for parametric patches producing consistent and watertight tessellations. The scheme uses only a few base tessellations and is particularly well suited for use with instancing. In addition, a novel GPGPU bucket sort approach based on HistoPyramid is presented. The paper gives implementational details and performance benchmarks. (Semi-uniform Adaptive Patch Tessellation. C.Â Dyken, M.Â Reimers, and J.Â Seland. Computer Graphics Forum, to appear.)
This work describes the implementation of a real-time visual tracker that targets the position and 3D pose of objects (specifically faces) in video sequences. The use of GPUs for the computation and efficient sparse-template-based particle filtering allows real-time processing even when tracking multiple faces simultaneously in high-resolution video frames. Using a GPU and the NVIDIA CUDA technology, performance improvements as large as ten times compared to a similar CPU-only tracker are achieved. (Real-time Visual Tracker by Stream Processing. Oscar Mateo Lozano, and Kazuhiro Otsuka. Journal of Signal Processing Systems.)
Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations (Part 2: Double Precision GPUs)July 14th, 2008
In a previous publication, we have examined the fundamental difference between computational precision and result accuracy in the context of the iterative solution of linear systems as they typically arise in the Finite Element discretization of Partial Differential Equations (PDEs). In particular, we evaluated mixed- and emulated-precision schemes on commodity graphics processors (GPUs), which at that time only supported computations in single precision. With the advent of graphics cards that natively provide double precision, this report updates our previous results.
We demonstrate that with new co-processor hardware supporting native double precision, such as NVIDIA’s G200 and T10 architectures, the situation does not change qualitatively for PDEs, and the previously introduced mixed precision schemes are still preferable to double precision alone. But the schemes achieve significant quantitative performance improvements with the more powerful hardware. In particular, we demonstrate that a Multigrid scheme can accurately solve a common test problem in Finite Element settings with one million unknowns in less than 0.1 seconds, which is truely outstanding performance. We support these conclusions by exploring the algorithmic design space enlarged by the availability of double precision directly in the hardware.
(Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations (Part 2: Double Precision GPUs). Dominik Göddeke and Robert Strzodka. Technical Report, 2008.)
CUDA.NET is an effort by GASS to provide access to NVIDIA CUDA functionality through .NET applications. The library currently provides .NET bindings for CUDA functions, allowing programmers to use existing .NET applications as hosts for CUDA enabled devices, this way exposing a strong co-processor that can be used with .NET. The current distribution contains a .NET library that can be used from any .NET application and language, along with examples in C# and Python showing how to use the library. The API is very straightforward and compatible with the NVIDIA CUDA API available for C applications with few modifications to ease development and align with .NET standards. See the CUDA.NET home page for more details.
From the press release:
SANTA CLARA, CA & URBANA, IL JUNE 30, 2008 NVIDIA Corporation (Nasdaq: NVDA), the worldwide leader in visual computing technologies, and the University of Illinois at Urbana-Champaign (UIUC) today announced that UIUC has been named as the world’s first CUDA Center of Excellence. In addition to the appointment, NVIDIA has donated $500,000 to UIUC for the development of parallel computing facilities and the continuation of its research programs.
“The CUDA Center of Excellence program rewards schools that truly embrace the concept of parallel processing as the future of computing”, said Dr. David Kirk, chief scientist at NVIDIA. “Schools receiving this accreditation integrate the CUDA software environment into their curriculum to help their students harness the capabilities of these new parallel processing architectures. As one of the country’s leading schools in this field, I am personally delighted to appoint UIUC as our first CUDA Center of Excellence.”
The Theoretical and Computational Biophysics Group at UIUC was one of the first research groups to leverage the parallel architecture of the GPU to accelerate their research in the field of computational biophysics. They have successfully accelerated NAMD/VMD, a popular parallel molecular dynamics application that analyzes large biomolecular systems. It is hoped that this donation will aid this group, and others at the university, to further their work and speed them down the path to great discovery.
FEAST is a hardware-oriented MPI-based Finite Element solver toolkit. With the extension FEASTGPU the authors have previously demonstrated that significant speed-ups in the solution of the scalar Poisson problem can be achieved by the addition of GPUs as scientific co-processors to a commodity based cluster. In this paper the authors put the more general claim to the test: Applications based on FEAST, that ran only on CPUs so far, can be successfully accelerated on a co-processor enhanced cluster without any code modifications. The chosen solid mechanics code has higher accuracy requirements and a more diverse CPU/co-processor interaction than the Poisson example, and is thus better suited to assess the practicability of the acceleration approach. The paper presents accuracy experiments, a scalability test and acceleration results for different elastic objects under load. In particular, it demonstrates in detail that the single precision execution of the co-processor does not affect the final accuracy. The paper establishes how the local acceleration gains of factors 5.5 to 9.0 translate into 1.6- to 2.6-fold total speed-up. Subsequent analysis reveals which measures will increase these factors further. (Dominik Göddeke, Hilmar Wobker, Robert Strzodka, Jamaludin Mohd-Yusof, Patrick McCormick, Stefan Turek. Co-Processor Acceleration of an Unmodified Parallel Solid Mechanics Code with FEASTGPU. International Journal of Computational Science and Engineering (to appear).)
In this tutorial, NVIDIA engineers and academic and industrial researchers will present CUDA and discuss its advanced use for science and engineering. The tutorial will demonstrate CUDA with traditional HPC examples including BLAS, FFT, and integration with Fortran and high-level languages (MATLAB, Mathematica, Python) and describe in detail the programming model at the heart of it all. It will then turn to advanced topics including optimizing CUDA programs, CUDA floating point performance and accuracy, and CUDA programming strategies and tips. Finally the tutorial will present detailed case studies in which domain scientists will describe their experience using CUDA to accelerate mature, deployed, real-world science codes. Scientists throughout industry and academia are already using CUDA to achieve dramatic speedups on production and research codes (see http://www.nvidia.com/cuda for a list of codes, academic papers and commercial packages based on CUDA). Presenters include Massimiliano Fatica (NVIDIA), Mark Harris (NVIDIA), Patrick LeGresley (NVIDIA), and Jim Phillips (UIUC). Follow this link to register.
The University of Maryland are sponsoring a GPGPU programming contest. All entries will be released under version 3 of the GPL at the conclusion of the contest. Contestants are asked to submit code for sparse matrix multiplication. UMD will be evaluating entries on both vector/sparse matrix and sparse matrix/sparse matrix multiplications, using a variety of different inputs. As the contest progresses, UMD will update the LeaderBoard regularly, so contestants will have some idea of where they stand. Contestants are welcome to make as many entries as they want, so submit early and then tweak your designs. Entries can be written in either GLSL or CUDA. Prizes include NVIDIA Quadro FX 5600 GPUs, sponsored by NVIDIA. (http://scriptroute.cs.umd.edu/gpucompete/)
This paper by Lieberman et al. at the University of Maryland describes an application of GPU processing to the similarity join, a common operation in spatial databases. A similarity join takes two sets of points A, B and returns pairs p ∈ A, q ∈ B where the distance D(p,q) ≤ ε. The similarity join is a common spatial database operation with many applications. An algorithm named LSS is presented that executes on a GPU, taking advantage of the GPU’s parallelism and large data throughput. To achieve peak efficiency, LSS relies only on simple primitive operations that execute quickly on the GPU, such as the sorting and searching of arrays. It recasts the similarity join as a sort-and-search problem by mapping its input datasets onto a set of space-filling curves, generated by a parallel sort routine on the GPU. It then searches small intervals of these curves that are guaranteed to contain all pairs of the correct result. LSS offers a balance between time and work efficiencies and is shown to perform well when compared against existing prominent high-dimensional similarity join methods. (M. D. Lieberman, J. Sankaranarayanan, and H. Samet. A fast similarity join algorithm using graphics processing units. In Proceedings of the 24th IEEE International Conference on Data Engineering, pages 1111-1120, Cancun, Mexico, April 2008.)