Intel Ct Tera-Scale White paper

November 5th, 2007

From the introduction: “Processors architecture is evolving towards more software-exposed parallelism through two features: more cores and wider SIMD ISA. At the same time, graphics processors (GPUs) are gradually adding more general purpose programming features. Several software development challenges arise from these trends. First, how do we mitigate the increased software development complexity that comes with exposing parallelism to the developer? Second, how do we provide portability across (increasing) core counts and SIMD ISA? Ct is a deterministic parallel programming model intended to leverage the best features of emerging general-purpose GPU (GPGPU) programming models while fully exploiting CPU flexibility. A key distinction of Ct is that it comprises a top-down design of a complete data parallel programming model, rather than being driven bottomup by architectural limitations, a flaw in many GPGPU programming models.” (Flexible Parallel Programming for Terascale Architectures with Ct)

NVIDIA Releases CUDA for GPU Computing

February 16th, 2007

A beta of NVIDIA’s CUDA development environment, NVIDIA’s new technology for computing with GPUs, is now posted on This beta release of CUDA contains a C compiler for the GPU and an SDK with examples to get you started coding for the GPU. From the press release:

GPU Computing with CUDA is a new approach to computing where hundreds of on-chip processors simultaneously communicate and cooperate to solve complex computing problems. Applications that require mathematically intensive computing on large amounts of data are ideal targets for GPU Computing. NVIDIA NVIDIA’s CUDA technology is available in GeForce 8800 graphics products and future NVIDIA Quadro Professional Graphics solutions based on 8-series (G8X) GPUs. Developers are invited to download the beta version of the CUDA Software Developers Kit (SDK) and C compiler for Windows XP and Linux (RedHat Release 4 Update 3) from the NVIDIA Developer Web site at GPU Computing Forums for news, discussion and programming tips are also available at

NVIDIA Announces CUDA GPU Computing Architecture

November 10th, 2006

NVIDIA Corporation today unveiled NVIDIA CUDA technology, a new architecture for computing on NVIDIA GPUs, and the industry’s first C-compiler development environment for the GPU. From the NVIDIA Press Release:

GPU computing with CUDA is a new approach to computing where hundreds of on-chip processor cores simultaneously communicate and cooperate to solve complex computing problems up to 100 times faster than traditional approaches. This breakthrough architecture is complemented by another first: the NVIDIA C-compiler for the GPU. This complete development environment gives developers the tools they need to solve new problems in computation-intensive applications such as product design, data analysis, technical computing, and game physics. CUDA-enabled GPUs offer dedicated features for computing, including the Parallel Data Cache, which allows 128, 1.35 GHz processor cores in newest generation NVIDIA GPUs to cooperate with each other while performing intricate computations. Developers access these new features through a separate computing driver that communicates with DirectX and OpenGL, and the new NVIDIA C compiler for the GPU, which obsoletes streaming languages for GPU computing.

CUDA website:

A New Low-Level Interface for GPGPU Applications on ATI GPUs

August 10th, 2006

At SIGGRAPH in Boston, Derek Gerstmann of ATI presented a sketch titled, “A Performance-Oriented Data Parallel Virtual Machine for GPGPU Applications.” The system exposes GPU functionality at a low-level (including the fragment processors’ native instruction set), giving the programmer direct control over program compilation and loading, GPU memory management, and GPU/CPU synchronization. A write-up is available at If you are interested in obtaining the system for evaluation, please contact

Sh Version 0.8rc0 Released

November 10th, 2005

Sh Version 0.8.0rc0, the first release candidate for the upcoming Sh 0.8, is now available. There are plenty of  new features and bug fixes, but most importantly this release has an API that completely matches the book Metaprogramming GPUs with Sh, which the 0.8.x series of releases will stick to. (

Sh Version 0.7.8 Released

July 1st, 2005

A new version of the Sh language for GPU programming in C++ has been released. This version features a new backend infrastructure implementation allowing such things as running part of a stream application on the GPU and part on the CPU at the same time. Many other fixes as well as platform compatability enhancements were also added. (

Sh Version 0.7.7 released

April 27th, 2005

Version 0.7.7 of the Sh GPU Metaprogramming Language is now released. Sh allows GPUs to be programmed directly using C++. This version features a back end for the OpenGL Shading Language, Mac OS X support, and major speed improvements for stream programs (the GPGPU subset of Sh). (

Scout: A Hardware-Accelerated System for Quantitatively Driven Visualization and Analysis

October 20th, 2004

This IEEE Visualization 2004 paper by McCormick et al. describes the Scout System and Language that allow the GPU to be programmed for scientific visualization. Scout uses a data parallel language that allows the user to program visual mappings from data values to the final rendered result. These techniques can be used to replace standard user interface components, such as the transfer function editor commonly used in volume rendering. (“Scout: A Hardware-Accelerated System for Quantitatively Driven Visualization and Analysis”, Patrick S. McCormick, Jeff Inman, James P. Ahrens, Chuck Hansen and Greg Roth, In Proceedings IEEE Visualization 2004, pages 171-178, October 2004.)

Cg 1.3 Beta 2 Released

August 19th, 2004

Cg Release 1.3 Beta 2 has been released with support for the latest GeForce 6 Series (NV4X) GPUs. This version of Cg offers the following features and improvements:

  • New vp40 profile, which enables texture sampling from within vertex programs
  • New fp40 profile, which provides a robust branching model in fragment programs, and support for output to multiple draw buffers (“MRTs”)
  • Support for writing more than one color output (i.e., MRTs) in the arbfp1 and ps_2* profiles
  • New semantics to access OpenGL fixed-function state vectors from within ARB_vertex_program and ARB_fragment_program
  • New “-fastprecision” option for arbfp*, fp30, and fp40 profiles, to use reduced precision storage (fp16) when appropriate
  • Support for 16 profiles


Ashli 1.4.0 released

June 25th, 2004

ATI’s Ashli version 1.4.0 has been released and is available for download from: Ashli Home. Ashli is a toolkit intended to assist developers exploring programmable shading on GPUs. It supports a reasonable subset of OpenGL (GLSL), Microsoft’s DirectX (HLSL) and RenderMan shading languages. Ashli’s significant contribution is in hardware resource virtualization, segmenting a complex shader program into GPU realizable streams. The posted Ashli viewer application demonstrates the use of shader partitions in a multi-pass rendering context. Ashli outputs both metadata and code, orthogonal to any of the languages supported. Targets include OpenGL ARB_vertex_program and ARB_fragment_program, and DirectX 9.0 Vertex Shader and Pixel Shader versions 2.0 and 2.X API’s. Optionally, Ashli emits a unified Microsoft FX file format, embedding progressive techniques of state and code sections. (Ashli 1.4.0)