This paper by Lieberman et al. at the University of Maryland describes an application of GPU processing to the similarity join, a common operation in spatial databases. A similarity join takes two sets of points *A, B* and returns pairs *p* ∈ *A*, *q* ∈ *B* where the distance *D(p,q) ≤ ε*. The similarity join is a common spatial database operation with many applications. An algorithm named LSS is presented that executes on a GPU, taking advantage of the GPU’s parallelism and large data throughput. To achieve peak efficiency, LSS relies only on simple primitive operations that execute quickly on the GPU, such as the sorting and searching of arrays. It recasts the similarity join as a sort-and-search problem by mapping its input datasets onto a set of space-filling curves, generated by a parallel sort routine on the GPU. It then searches small intervals of these curves that are guaranteed to contain all pairs of the correct result. LSS offers a balance between time and work efficiencies and is shown to perform well when compared against existing prominent high-dimensional similarity join methods. (M. D. Lieberman, J. Sankaranarayanan, and H. Samet. A fast similarity join algorithm using graphics processing units. In Proceedings of the 24th IEEE International Conference on Data Engineering, pages 1111-1120, Cancun, Mexico, April 2008.)

## A Fast Similarity Join Algorithm Using Graphics Processing Units

May 25th, 2008## Multiscale and local search methods for real time region tracking with particle filters: local search driven by adaptive scale estimation on GPUs

May 25th, 2008This paper by Cabido et al. presents a real-time object tracking algorithm, based on the hybridization of particle filtering (PF) and a multi-scale local search (MSLS) algorithm, for both CPU and GPU architectures. The developed system provides successful results in precise tracking of single and multiple targets in monocular video, operating in real-time at 70 frames per second for 640 × 480 video resolutions on the GPU, up to 1100% faster than the CPU version of the algorithm. (Multiscale and local search methods for real time region tracking with particle filters: local search driven by adaptive scale estimation on GPUs. Raul Cabido, Antonio S. Montemayor, Juan Jose Pantrigo, and Bryson R. Payne. *Machine Vision and Applications*, Springer, 2008.)

## GPU acceleration of cutoff pair potentials for molecular modeling applications

May 25th, 2008The advent of systems biology requires the simulation of ever-larger biomolecular systems, demanding a commensurate growth in computational power. This paper examines the use of the NVIDIA Tesla C870 graphics card programmed through the CUDA toolkit to accelerate the calculation of cutoff pair potentials, one of the most prevalent computations required by many different molecular modeling applications. The paper presents algorithms to calculate electrostatic potential maps for cutoff pair potentials. Whereas a straightforward approach for decomposing atom data leads to low computational efficiency, a new strategy enables fine-grained spatial decomposition of atom data that maps efficiently to the C870’s memory system while increasing work efficiency of atom data traversal by a factor of 5. The memory addressing flexibility exposed through CUDA’s SPMD programming model is crucial in enabling this new strategy. An implementation of the new algorithm provides a greater than threefold performance improvement over our previously published implementation and runs 12 to 20 times faster than optimized CPU-only code. The lessons learned are generally applicable to algorithms accelerated by uniform grid spatial decomposition. (C. I. Rodrigues, D. J. Hardy, J. E. Stone, K. Schulten, W. W. Hwu., GPU acceleration of cutoff pair potentials for molecular modeling applications. Proceedings of the 2008 Conference On Computing Frontiers, pp.273-282, 2008.) (http://www.ks.uiuc.edu/Research/gpu/)

## GPU Computing

May 25th, 2008Abstract: “The graphics processing unit (GPU) has become an integral part of today’s mainstream computing systems. Over the past six years, there has been a marked increase in the performance and capabilities of GPUs. The modern GPU is not only a powerful graphics engine but also a highly parallel programmable processor featuring peak arithmetic and memory andwidth that substantially outpaces its CPU counterpart. The GPU’s rapid increase in both programmability and capability has spawned a research community that has successfully mapped a broad range of computationally demanding, complex problems to the GPU. This effort in general-purpose computing on the GPU, also known as GPU computing, has positioned the GPU as a compelling alternative to traditional microprocessors in high-performance computer systems of the future. We describe the background, hardware, and programming model for GPU computing, summarize the state of the art in tools and techniques, and present four GPU computing successes in game physics and computational biophysics that deliver order-of-magnitude performance gains over optimized CPU applications. (J. D. Owens, M. Houston, D. Luebke, S. Green, J. E. Stone, J. C. Phillips, “GPU Computing”, Proceedings of the IEEE, vol.96, no.5, pp.879-899, May 2008)

## CIGPU 5 June 2008 Hong Kong additional technical discussion

May 25th, 2008In addition to the papers already announced, Dr. Simon Harding (Memorial University, Newfoundland) and Dr. Tien-Tsin Wong (The Chinese University of Hong Kong) will lead a discussion on the practicalities of running evolution on modern graphics cards. They will contrast the current leading GPGPU tools considering ease of use, and support for debugging and performance monitoring. CIGPU will close with a short session considering the future of computational intelligence on GPUs.

## Graph Layout on the GPU

May 25th, 2008A graph is an ordered pair G=(V,E) where V is a set of nodes and E is a set of edges connecting nodes. Graph drawing addresses the problem of creating geometric representations of graphs. Unlike matrices or images, graphs are unstructured and hence graph layout may not seem to be suitable for acceleration on the GPU. These papers present two GPU-accelerated graph drawing algorithms which are able to quickly compute aesthetic layouts of large graphs. One is for the layout of a single graph and the other is for computing stable layouts of a sequence of graphs. Speedups of 5.5x to 17x relative to a CPU implementation are demonstrated. (Yaniv Frishman and Ayellet Tal, Multi-Level Graph Layout on the GPU, IEEE Transactions on Visualization and Computer Graphics (Proceedings Information Visualization 2007), 13(6):1310-1317, 2007)

(Yaniv Frishman and Ayellet Tal, Online Dynamic Graph Drawing, accepted to IEEE Transactions on Visualization and Computer Graphics)

## gDEBugger V4.1 Adds Geometry Shaders Support and new ATI Performance Metrics Integration

May 25th, 2008The new gDEBugger V4.1 adds Geometry Shader Support and enables developers to view allocated geometry shader objects, shader source code and properties. It also allows the developer to Edit and Continue shaders *on the fly*. Support for the new ATI (AMD) driver performance metrics infrastructure has been added. This integration enables users to view ATI performance metrics such as hardware utilization, vertex wait for pixel, pixel wait for vertex, overdraw and more. These performance metrics together with gDEBugger’s Performance Analysis Toolbar provide a powerful solution for locating graphics system performance bottlenecks. gDEBugger, an OpenGL and OpenGL ES debugger and profiler, traces application activity on top of the OpenGL API, letting programmers see what is happening within the graphics system implementation to find bugs and optimize OpenGL application performance. gDEBugger runs on Microsoft Windows and Linux operating systems. (http://www.gremedy.com)

## PRACE award presented to young scientistat ISC’08 for GPGPU work

May 20th, 2008From this article: “PRACE, Partnership for Advanced Computing in Europe, awarded a prize for the best scientific paper submitted to ISC’08 by a European student or young scientist on petascaling. The authors of the award winning paper are Stefan Turek, Dominik Göddeke, Christian Becker, Sven H.M. Buijssen and Hilmar Wobker from the Institute of Applied Mathematics, Dortmund University of Technology, Germany. Their work, UCHPC : UnConventional High Performance Computing for Finite Element Simulations, was selected by the ISC’08 Award Committee, headed by Michael Resch, High Performance Computing Center Stuttgart. Achim Bachem, Chairman of the Board Forschungszentrum Jülich and PRACE coordinator presented the PRACE Award at the ISC’08 opening ceremony in Dresden on Wednesday, 18 June. Dominik Göddeke, Ph.D. student in the team of Professor Stefan Turek will receive a sponsorship for the participation in a conference relevant to Petascale computing.” Dominik has been an active GPGPU researcher for several years, and is one of the most active and helpful contributors to the GPGPU.org forums. (PRACE award presented to young scientist at ISC’08)

## GRIP – A Rugged GPU Accelerated Image Processing System

April 23rd, 2008Vision4ce launched a new line of General-purpose Rugged Image Processing (GRIP) products at the recent SPIE Defense and Security Symposium in Orlando from 18th-20th March 2008. The GRIP-Beta showed cutting edge GPGPU-based image processing demonstrations, analog and Gigabit Ethernet video streams and the robust functionality in the Gripworkx image processing framework. The Vision4ce team with GRIP now addresses numerous rugged embedded computing challenges with a cost effective, readily available rugged solution that might normally be served by more expensive and lengthy FPGA approaches. See www.vision4ce.com for more information.

## CUDPP 1.0a Adds Segmented Scan and Sparse Matrix-Vector Multiplication

April 20th, 2008Version 1.0 alpha of CUDPP, the CUDA Data-Parallel Algorithms Library, has been released. This version adds the segmented scan algorithm and sparse matrix-vector multiplication to CUDPP’s repertoire. Other new features include an improved “plan”-based configuration interface, an improved scan algorithm for higher performance, support for more inclusive scans and more scan operators, an improved stream compaction interface. In addition, CUDPP 1.0a adds support for CUDA 2.0 and the Windows Vista and Mac OS X (10.5.2 and higher) operating systems. CUDPP works with NVIDIA CUDA versions 1.1 and higher.