Legacy GPGPU Glossary

This glossary explains some of the important terms frequently encountered in legacy GPGPU programming.  This legacy glossary was last updated years ago and does not contain definitions of the latest GPGPU terminology or technology.

The OpenGL Architecture Review Board. The ARB is responsible for overseeing the evolution of the OpenGL specification and standardizing OpenGL extensions.

Copy To Texture. This is a method of using pixel data that has been rendered as a texture. The CTT method is to copy rendered pixels out of the frame buffer into a separate texture. Generally RTT is preferred over CTT when it is available, and CTT is considered a fallback. However for very small textures on some architectures, sometimes CTT can be faster.

In the IEEE floating point standard, a denormal is a value whose floating point representation contains exponent bits that are all zero. Denormals provide a means of “gradual underflow”, in which values very near to zero gradually lose precision, avoiding a “gap” between the smallest representable value and zero. GPUs do not generally contain hardware support for denormal numbers, however, which is one of the reasons that GPU arithmetic is not entirely IEEE-compliant. Any value that would have been a denormal just gets flushed out to zero by the GPU. This avoids a special case that would add significant complexity to the floating point logic of the GPU, for a considerable savings in hardware.

See also: Wikipedia article on denormals

Dynamic branching is a means of evaluating the control flow of a program dynamically at runtime. Only the most recent GPU’s support dynamic branching. Furthermore, GPUs that do support dynamic branching may incur a measurable branch penalty from its use, so it should be used sparingly.

NV3x does not entirely support dynamic branching, by which we basically mean “true” branching (jump from one program counter value to another, non-consecutive program counter value at runtime). NV3x-class and NV4x-class vertex processors do both support “unstructured” dynamic branching… basically “goto” type functionality. On the other hand, NV3x-class fragment processors support no dynamic branching at all — only the aforementioned predication. This was one of the long-awaited features that finally arrived in NV4x… dynamic branching in a fragment program. NV4x fragment processors support “structured” dynamic branching (if/then/else, for loops, etc).

The opposite of “dynamic” branching, then, is basically branching that can be evaluated a priori at compile time… branches that can be completely unrolled by the compiler. The common example of this is a for loop that executes a constant number of times.

Note that comparison and branching are not necessarily the same thing; consider the assembly operations SLT or SGE (set-if-less-than and set-if-greater-or-equal). These are comparison operations that act exactly like normal arithmetic operations… they take a couple of operands, do something to them (compare them in this case) and produce a numeric result (0 or 1). Another alternative to dynamic branching is predication, in which case we can have conditionalized execution of a single instruction at a time without having to actually branch around that instruction. NV3x-class hardware supports both of these things.

Framebuffer Object. This is an OpenGL extension that defines a simple interface for drawing to rendering destinations other than the buffers provided to the GL by the window system.

See: ARB_framebuffer_object specification

GLX is the window system interface between OpenGL and X11.

General-Purpose computation on Graphics Processing Units.

Multiple Render Targets. This is a feature that allows a GPU program to write to multiple color buffers simultaneously. In OpenGL, this functionality was introduced by the ATI_draw_buffers extension, which was later renamed to ARB_draw_buffers and finally promoted into the core GL in OpenGL 2.0.

See also: MRT tutorial

Pixel Buffer Object. This is an OpenGL extension that expands on the interface provided by buffer objects. It is intended to permit buffer objects to be used not only with vertex array data, but also with pixel data.

See: ARB_vertex_buffer_object specification

pbuffers are offscreen rendering buffers for an OpenGL renderer. They are commonly used in GPGPU to achieve RTT (on Windows) or CTT (on X11 or Windows). They are also used because they can store floating point values, whereas the default framebuffer cannot. Due to their complexity and inefficiency (and a lack of pbuffer RTT with GLX), however, pbuffers are now largely deprecated in favor of FBO.

See also: WGL_ARB_pbuffer specification

Ping-ponging is a technique used with RTT to avoid reading and writing the same buffer simultaneously, instead bouncing back and forth between a pair of buffers. In iterative algorithms that write data in one pass and then read back that data to generate the results of the next pass, such a technique is often required. In pass 1, data is written into buffer A, and then in pass 2, buffer A is bound for reading and buffer B is written. If a third pass is required, buffer B becomes the source and buffer A becomes the destination.

Predication is a strategy in computer architecture design for mitigating the costs usually associated with conditional branches, particularly branches to short sections of code. It does this by allowing each instruction to conditionally either perform an operation or do nothing. Predication appeared in NVIDIA GPU’s as a means of conditional execution substantially earlier than true dynamic branching did.

See also: Wikipedia article on branch predication

Render To Texture. This is a method of using pixel data that has been rendered as a texture. The RTT method is to directly bind the rendered pixel buffer as a texture. This is generally preferred over the alternative method CTT when it is available since it is more efficient. With OpenGL on Windows, RTT is supported with pbuffers; it is supported in a platform-independent way by the newer (and preferred) FBO extension.


Vertex Buffer Object. This is an OpenGL extension that defines an interface that allows various types of data (especially vertex array data) to be cached in high-performance graphics memory, thereby increasing the rate of data transfers. Buffer objects were promoted from this extension into the core GL in OpenGL 1.5.

See: ARB_vertex_buffer_object specification

Vertex Texture Fetch. Shader Model 3.0 GPUs are capable of performing texture reads from within a vertex program. Currently, only NVIDIA GeForce 6 series and GeForce 7 series GPUs have this ability. See NVIDIA’s Using Vertex Textures whitepaper for details and limitations of VTF.

WGL is the window system interface between OpenGL and Microsoft Windows.