Shader Maker is a simple, cross-platform GLSL editor. It works on Windows, Linux, and Mac OS X. Shader Maker provides the basics of a shader editor, such that students can get started with writing their own shaders as quickly as possible. This includes: syntax highlighting in the GLSL editors; vertex, fragment, and geometry shader editors; interactive editing of uniform variables; light source parameters; pre-defined simple shapes (e.g., torus); a simple OBJ loader; and more. (Shader Maker)

## Shader Maker: a simple, truly cross-platform GLSL editor

April 20th, 2008## SHARCNET Symposium on GPU and CELL Computing

April 20th, 2008University of Waterloo

Waterloo, Ontario, Canada

May 27th 2008

This one-day symposium will explore the use of GPUs, CELL processors, FPGAs and multi-core CPUs for large-scale scientific computing. The symposium program includes invited talks on the LANL Roadrunner CELL supercomputer, the RapidMind platform for multicore CPUs and many-core accelerators, and NVIDIA CUDA. For more information, see http://www.sharcnet.ca/events/ssgc2008/

## gDEBugger V4.0 Adds Linux Support and a Buffer Viewer

April 2nd, 2008The new gDEBugger V4.0 introduces gDEBugger Linux. This new exciting product adds 32-bit and 64-bit Linux Support, bringing all of gDEBugger’s debugging and profiling abilities to the Linux OpenGL developers’ world. A new Texture and Buffer Viewer has been added. This Viewer allows you to view textures, static buffers and pbuffers as images or raw data in its original format, including non-RGB data formats (float, depth, integer, luminance, etc). This version also includes significant performance improvements. gDEBugger, an OpenGL and OpenGL ES debugger and profiler, traces application activity on top of the OpenGL API to let programmers see what is happening within the graphics system implementation to find bugs and optimize OpenGL application performance. (http://www.gremedy.com)

## CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment

April 2nd, 2008The Smith-Waterman algorithm has been available for more than 25 years. It is based on a dynamic programming approach that explores all the possible alignments between two biological sequences; as a result it returns the optimal local alignment. Unfortunately, the computational cost is very high, requiring a number of operations proportional to the product of the length of two sequences. This paper by Svetlin Manavski and Giorgio Valle describes SmithWaterman-CUDA, an open-source project to perform fast sequence alignment on the GPU. Although the software performs the optimal Smith-Waterman alignment it is faster than heuristics approaches like FASTA and BLAST. The tests on protein data banks show up to 30x speed up related to reference CPU implementations. (Svetlin A. Manavski, Giorgio Valle, CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment, BMC Bioinformatics 2008, 9(Suppl 2):S10 (26 March 2008))

## Relational Joins on Graphics Processors

April 2nd, 2008Abstract: “We present a novel design and implementation of relational join algorithms for new-generation graphics processing units (GPUs). Taking advantage of GPU features, we design a set of data-parallel primitives such as split and sort, and use these primitives to implement indexed or non-indexed nested-loop, sort-merge and hash joins. Our algorithms utilize the high parallelism as well as the high memory bandwidth of the GPU, and use parallel computation and memory optimizations to effectively reduce memory stalls. We have implemented our algorithms on a PC with an NVIDIA G80 GPU and an Intel quad-core CPU. Our GPU-based join algorithms are able to achieve a performance improvement of 2-7X over their optimized CPU-based counterparts. (Bingsheng He, Ke Yang, Rui Fang, Mian Lu, Naga K. Govindaraju, Qiong Luo, and Pedro V. Sander. Relational Joins on Graphics Processors. ACM SIGMOD 2008.)

## A SIMD interpreter for Genetic Programming on GPU Graphics Cards

April 2nd, 2008Abstract: Mackey-Glass chaotic time series prediction and nuclear protein classification show the feasibility of evaluating genetic programming populations directly on parallel consumer gaming graphics processing units. Using a Linux KDE computer equipped with an NVIDIA GeForce 8800 GTX graphics processing unit card the C++ SPMD interpretter evolves programs at Giga GP operations per second (895 million GPops). We use the RapidMind general processing on GPU (GPGPU) framework to evaluate an entire population of a quarter of a million individual programs on a non-trivial problem in 4 seconds. An efficient reverse polish notation (RPN) tree based GP is given. (A SIMD interpreter for Genetic Programming on GPU Graphics Cards. W.B. Langdon and W. Banzhaf. In M. Neill, L. Vanneschi, A.I. Esparcia Alcazar, S. Gustafson eds., EuroGP 2008, pp73-85. Springer, LNCS 4971, 26-28 March, Naples.)

## GPGPU Based Image Segmentation Livewire Algorithm Implementation

April 1st, 2008This thesis presents a GPU implementation of the Livewire algorithm. The algorithm is divided in three phases: Sobel or Laplacian filter convolution, image modeling as a grid graph and solving the non-negative weighted edges single-source shortest path problem. In order to calculate the shortest path, an adapted version of the delta-stepping algorithm was developed for GPUs, using CUDA. A critical result analysis shows that intense speedups are seen in image filtering algorithms. On the other hand, the wide use of dependent device memory look-ups has constrained delta-stepping algorithm from achieving higher performance than CPU implementation although a better performance is expected for wider graphs. Besides showing the viability of the Livewire algorithm implementation, this thesis makes available an open-source image segmentation GPU based application, which can be used as example for future GPU algorithm implementations at http://code.google.com/p/gpuwire/.

## Quantum Chemistry on GPUs

April 1st, 2008Ivan Ufimtsev and Todd Martínez at the University of Illinois at Urbana-Champaign have implemented an efficient method of calculating two-electron repulsion integrals over Gaussian basis functions on the GPU. Virtually all modern quantum chemical calculations require evaluating millions to billions of these integrals. This problem turns out to be well-suited to the massively parallel architecture of GPUs by an appropriate partitioning of the problem. A benchmark test performed for the evaluation of approximately one million (ss|ss) integrals over contracted s-orbitals showed that a naïve algorithm implemented on the GPU achieves up to 130-fold speedup over a traditional CPU implementation on an AMD Opteron. Subsequent calculations on a 256-atom DNA strand show that the GPU advantage is maintained for basis sets including higher angular momentum functions. (Quantum Chemistry on Graphical Processing Units. 1. Strategies for Two-Electron Integral Evaluation, Ivan S. Ufimtsev and Todd J. Martínez, *J. Chem. Theory Comput.*, 4 (2), 222 -231, 2008. doi:10.1021/ct700268q)

## A Flexible Kernel for Adaptive Mesh Refinement on GPU

April 1st, 2008This paper by Boubekeur (TU Berlin) and Schlick (INRIA) presents a flexible GPU kernel for adaptive on-the-fly refinement of meshes with arbitrary topology. By simply reserving a small amount of GPU memory to store a set of adaptive refinement patterns, on-the-fly refinement is performed by the GPU, without any preprocessing or additional topology data structure. The level of adaptive refinement can be controlled by specifying a per-vertex depth tag, in addition to usual position, normal, color and texture coordinates. This depth tag is used by the kernel to instanciate the correct refinement pattern. Finally, the refined patch produced for each triangle can be displaced by the vertex shader, using any kind of geometric refinement, such as Bezier patch smoothing, scalar valued displacement, procedural geometry synthesis or subdivision surfaces. This refinement engine requires no multi-pass rendering, fragment processing, or special preprocessing of the input mesh structure. It can be implemented on any GPU with vertex shading capabilities. (A Flexible Kernel for Adaptive Mesh Refinement on GPU, Tamy Boubekeur and Christophe Schlick, Computer Graphics Forum, 2008.)

## Accelerating Resolution-of-the-Identity Second-Order MÃ¸ller-Plesset Quantum Chemistry Calculations with Graphical Processing Units

February 11th, 2008In this paper we describe a modification of a general purpose code for quantum mechanical calculations of molecular properties (Q-Chem) to use a graphical processing unit. We report a 4.3x speedup of the resolution-of-the-identity second-order Møller-Plesset perturbation theory execution time for single point energy calculation of linear alkanes. Furthermore, we obtain the correlation and total energy for n-octane conformers as the torsional angle of central bond is rotated to show that precision is not lost for these types of calculations. This code modification is accomplished using the NVIDIA CUDA Basic Linear Algebra Subprograms (CUBLAS) library for an NVIDIA Quadro FX 5600 graphics card. Finally, we anticipate further speedups of other matrix algebra based electronic structure calculations using a similar approach. (Accelerating Resolution-of-the-Identity Second-Order Møller-Plesset Quantum Chemistry Calculations with Graphical Processing Units. Vogt, L., Olivares-Amaya, R., Kermes, S., Shao, Y., Amador-Bedolla, C., and Aspuru-Guzik, A. *J. Phys. Chem. A*, 2008, DOI: 10.1021/jp0776762)