Workshop on GPU Programming for Molecular Modeling, August 6-8,2010, University of Illinois

June 18th, 2010
GPU-Accelerated Ion Placement

GPU-Accelerated Ion Placement

The Theoretical and Computational Biophysics Group, NIH Resource for Macromolecular Modeling and Bioinformatics (www.ks.uiuc.edu) at the University of Illinois at Urbana-Champaign, presents a Workshop on GPU Programming for Molecular Modeling to be held August 6-8, 2010, at the Beckman Institute for Advanced Science and Technology, on the University of Illinois campus in Urbana, Illinois, USA. Application, selection, and notification of participants is on-going through July 29, 2010.

Note: Participants are encouraged to attend the multi-site “Proven Algorithmic Techniques for Many-core Processors” workshop the preceding week (August 2-6) at the location of their choice. Registration for this workshop is required for participants without equivalent GPU-programming training or experience.

“Believe it or Not! Multi-core CPUs Can Match GPU Performance for FLOP-intensive Application!”

May 30th, 2010

Abstract:

In this work, we evaluate performance of a real-world image processing application that uses a cross-correlation algorithm to compare a given image with a reference one. The algorithm processes individual images represented as 2-dimensional matrices of single-precision floating-point values using O(n^4) operations involving dot-products and additions. We implement this algorithm on a nVidia GTX 285 GPU using CUDA, and also parallelize it for the Intel Xeon (Nehalem) and IBM Power7 processors, using both manual and automatic techniques. Pthreads and OpenMP with SSE and VSX vector intrinsics are used for the manually parallelized version, while a state-of-the-art optimization framework based on the polyhedral model is used for automatic compiler parallelization and optimization. The performance of this algorithm on the nVidia GPU suffers from: (1) a smaller shared memory, (2) unaligned device memory access patterns, (3) expensive atomic operations, and (4) weaker single-thread performance. On commodity multi-core processors, the application dataset is small enough to fit in caches, and when parallelized using a combination of task and short-vector data parallelism (via SSE/VSX) or through fully automatic optimization from the compiler, the application matches or beats the performance of the GPU version. The primary reasons for better multi-core performance include larger and faster caches, higher clock frequency, higher on-chip memory bandwidth, and better compiler optimization and support for parallelization. The best performing versions on the Power7, Nehalem, and GTX 285 run in 1.02s, 1.82s, and 1.75s, respectively. These results conclusively demonstrate that, under certain conditions, it is possible for a FLOP-intensive structured application running on a multi-core processor to match or even beat the performance of an equivalent GPU version.

(Rajesh Bordawekar and Uday Bondhugula and Ravi Rao: “Believe It or Not! Multi-core CPUs Can Match GPU Performance for FLOP-intensive Application!”. Technical Report RC24982, IBM Thomas J. Watson Research Center, Apr. 2010.)

Lattice-Boltzmann Simulation of the Shallow-Water Equations with Fluid-Structure Interaction on Multi- and Manycore Processors

February 28th, 2010

Abstract:

We present an efficient method for the simulation of laminar fluid flows with free surfaces including their interaction with moving rigid bodies, based on the two-dimensional shallow water equations and the Lattice-Boltzmann method. Our implementation targets multiple fundamentally different architectures such as commodity multicore CPUs with SSE, GPUs, the Cell BE and clusters. We show that our code scales well on an MPI-based cluster; that an eightfold speedup can be achieved using modern GPUs in contrast to multithreaded CPU code and, finally, that it is possible to solve fluid-structure interaction scenarios with high resolution at interactive rates.

(Markus Geveler, Dirk Ribbrock, Dominik Göddeke and Stefan Turek: “Lattice-Boltzmann Simulation of the Shallow-Water Equations with Fluid-Structure Interaction on Multi- and Manycore Processors”, Accepted in: Facing the Multicore Challenge, Heidelberg, Germany, Mar. 2010. Link.)

HONEI: A collection of libraries for numerical computations targeting multiple processor architectures

February 2nd, 2010

Abstract:

We present HONEI, an open-source collection of libraries offering a hardware oriented approach to numerical calculations. HONEI abstracts the hardware, and applications written on top of HONEI can be executed on a wide range of computer architectures such as CPUs, GPUs and the Cell processor. We demonstrate the flexibility and performance of our approach with two test applications, a Finite Element multigrid solver for the Poisson problem and a robust and fast simulation of shallow water waves. By linking against HONEI’s libraries, we achieve a two-fold speedup over straight forward C++ code using HONEI’s SSE backend, and additional 3–4 and 4–16 times faster execution on the Cell and a GPU. A second important aspect of our approach is that the full performance capabilities of the hardware under consideration can be exploited by adding optimised application-specific operations to the HONEI libraries. HONEI provides all necessary infrastructure for development and evaluation of such kernels, significantly simplifying their development.

(Danny van Dyk, Markus Geveler, Sven Mallach, Dirk Ribbrock, Dominik Göddeke and Carsten Gutwenger: HONEI: A collection of libraries for numerical computations targeting multiple processor architectures. Computer Physics Communications 180(12), pp. 2534-2543, December 2009. DOI 10.1016/j.cpc.2009.04.018)

JVSP Special Issue on Multicore Enabled Multimedia Applications & Architectures

July 17th, 2007

The trend of multicore processors development brings a shift of paradigm in applications development. Traditionally, increasing clock frequency is one of the main dimensions for conventional processors to achieve higher performance gains. Application developers used to improve performance of their applications by just waiting for faster processor platforms. Today, increasing clock frequency has reached a point of diminishing returns—and even negative returns if power is taken into account. Multicore processors, also known as Chip multiprocessors (CMPs), promise a power-efficiency way to increase performance and become more prevalent in vendors’ solutions, for example, IBM CELL Broadband Engine processors, Intel Core 2 Dual processors, and so on. However, the application or algorithm development process must be significantly changed in order to fully explore the potential of multicore processors. This special issue of the Journal of VLSI Signal Processing Systems is to discuss related challenges, issues, case studies, and solutions, especially focusing on multimedia-related applications, architectures, and programming environments, for example, understanding the complexity of developing a new application or porting an existing application onto a multicore processor. (Call for papers)

Workshop: Data-Parallel Programming Models for Many-Core Architectures

March 7th, 2007

Data-parallel programming models are emerging as an extremely attractive model for parallel programming, driven by several factors. Through deterministic semantics and constrained synchronization mechanisms, they provide race-free parallel-programming semantics. Furthermore, data-parallel programming models free programmers from reasoning about the details of the underlying hardware and software mechanisms for achieving parallel execution and facilitate effective compilation. Finally, efforts in the GPGPU movement and elsewhere have matured implementation technologies for streaming and data-parallel programming models to the point where high performance can be reliably achieved.

This workshop gathers commercial and academic researchers, vendors, and users of data-parallel programming platforms to discuss implementation experience for a broad range of many-core architectures and to speculate on future programming-model directions. Participating institutions include AMD, Electronic Arts, Intel, Microsoft, NVIDIA, PeakStream, RapidMind, and The University of New South Wales. (Link to Call for Participation, Data-Parallel Programming Models for Many-Core Architectures)