NVIDIA recently released an updated version of FX Composer with support for Shader Model 3.0 and tons of new sample shaders. The latest version also includes many developer-requested UI improvements, support for NV4x GPU performance analysis, and a beta SDK that allows developer to write their own automation scripts, importers/exporters, etc. The best way to view all the sample shaders is by downloading the latest NVIDIA SDK and looking at the “Effects” tab where you will find descriptions and screenshots of 250+ effects. (FX Composer Homepage)
Cg Release 1.3 Beta 2 has been released with support for the latest GeForce 6 Series (NV4X) GPUs. This version of Cg offers the following features and improvements:
- New vp40 profile, which enables texture sampling from within vertex programs
- New fp40 profile, which provides a robust branching model in fragment programs, and support for output to multiple draw buffers (“MRTs”)
- Support for writing more than one color output (i.e., MRTs) in the arbfp1 and ps_2* profiles
- New semantics to access OpenGL fixed-function state vectors from within ARB_vertex_program and ARB_fragment_program
- New “-fastprecision” option for arbfp*, fp30, and fp40 profiles, to use reduced precision storage (fp16) when appropriate
- Support for 16 profiles
This paper by Fan et. al. at Stony Brook University presents the use of a cluster of commodity GPUs for high performance scientific computing. As an example application, they have developed a parallel flow simulation using the lattice Boltzmann model (LBM) on a GPU cluster and have simulated the dispersion of airborne contaminants in the Times Square area of New York City. Using 30 GPU nodes, their simulation can compute a 480 x 400 x 80 LBM in 0.31 second/step, a speed which is 4.6 times faster than that of their previous CPU cluster implementation. Besides the LBM, the paper also discusses other potential applications of the GPU cluster, such as cellular automata, PDE solvers, and FEM. (Zhe Fan, Feng Qiu, Arie Kaufman, Suzanne Yoakum-Stover, GPU Cluster for High Performance Computing, To Appear in Proceedings of the ACM/IEEE SuperComputing 2004 (SC’04), November, 2004)
Linear expressions constitute one of the most basic operations in scientific computations. This paper by proposes a SIMD code optimization technique that enables efficient shader codes to be generated for evaluating linear expressions. Performance can be improved considerably by efficiently packing arithmetic operations into four-wide SIMD instructions through reordering of the operations in linear expressions. We demonstrate that this technique can be used effectively for programming both vertex and pixel shaders for a variety of mathematical applications. (SIMD Optimization of Linear Expressions for Programmable Graphics Hardware. C. Bajaj, I. Ihm, J. Min, and J. Oh)
Ne@tware Player 2004 supports the latest DirectX 9.0c graphic and media technologies. It allows you to design and watch visual special effects in real-time. The Shader Model 3.0 and High Level Shader Language (HLSL) support make Ne@tware Player a shader development platform for video processing in Graphic Processing Unit as well. Fullscreen, Multithread Video Engine, Action Mapper, and International Languages are other new features. (http://www.neatware.com/player/)
New versions of the NVIDIA SDK and FX Composer with Shader Model 3.0 support are now available. SDK 8.0 includes hundreds of all-new Shader Model 3.0 code samples and effects, including three new GPGPU code samples:
- GPGPU Fluid, a fast, realistic fluid simulation
- GPGPU Disease, a creepy dynamic “disease” effect based on chemical reaction-diffusion
- GPU Particles, a fast particle system that can simulate 1 million particles at 20 fps on GeForce 6800
Following the success of GPU Gems: Programming Techniques, Tips, and Tricks for Real-Time Graphics NVIDIA have decided to produce a second GPU Gems volume in order to showcase the best new ideas and techniques for the latest programmable GPUs. Tentatively titled GPU Gems II: Techniques for Graphics and Compute Intensive Programming, this book will be edited by Matt Pharr, software engineer at NVIDIA.
NVIDIA are looking for ideas from developers who are using GPUs in new ways to create stunning graphics and cutting-edge applications. Chapters should present techniques and ideas that are broadly useful to GPU programmers and can be integrated into their applications. GPU Gems II will have an increased focus on chapters exploring non-graphics applications of the computational capabilities of GPU hardware.
To participate, read the submission guidelines and send an e-mail to firstname.lastname@example.org with your proposed chapter title as the subject line, and the required description in the e-mail body. The deadline for submissions is Monday, August 16, 2004.
At its World-Wide Developers Conference Apple introduced Core Image as a feature of its upcoming Tiger release. Core Image is a framework for image processing on the GPU using a modified stream processing paradigm. Core Image is an interesting computational framework for offloading some general-purpose computations on to the GPU. It appears to be the first commercial effort to offer a general image computing environment for GPUs. The library comes with 100 basic plugins, called “Image Units”, and can be extended by developers. The computing model is based on stream processing, where each kernel is expressed in a high-level language and computes a result image based on some number of input images. The kernels can be strung together in arbitrary image computation “graphs”, in a model similar to that described by Michael Shantzis in his 1994 paper A Model for Efficient and Flexible Image Computing. Registered Apple Developers (free registration) can access a pre-release version of Core Image.
Jahshaka is an open-source, real-time editing, effects and image processing application that works in 3D space. The 1.9a8 release of jahshaka, available today, is supports GPU-accelerated image processing. The Jahshaka developers’ research in real-time image processing using the GPU is described in a white paper.
This paper presents an extensible system for interactively rendering multiple types of ray-casted objects in a manner compatible with pre-existing rendering engines. The sample implementation includes support for general quadrics and volumetric isosurfaces. It also includes a high-speed sphere renderer, and of course a standard triangle-rendering pipeline. The system is designed so that most of the algorithms designed to run on the existing raster engine can be added with minimal overhead/coding effort. We have demonstrated shadowing using the shadow-map algorithm. (“Beyond Triangles: A Simple Framework For Hardware-Accelerated Non-Triangular Primitives”, To be Submitted for publication.)