November 23rd, 2009
November 23rd, 2009
The 1.0 Beta version of OpenMM has just been released. OpenMM is a freely downloadable, high performance, extensible library that allows molecular dynamics (MD) simulations to run on high performance computer architectures, such as graphics processing units (GPUs). It currently supports NVIDIA GPUs and provides preliminary support for the new cross-platform, parallel programming standard OpenCL, which will enable it to be used on ATI GPUs.
The new release includes support for Particle Mesh Ewald and custom non-bonded interactions. In conjunction with this release, a new version of the code needed for accelerating the GROMACS molecular dynamics software using OpenMM is also available.
OpenMM is a collaborative project between Vijay Pande’s lab at Stanford University and Simbios, the National Center for Physics-based Simulation of Biological Structures at Stanford, which is supported by the National Institutes of Health. For more information on OpenMM, visit http://simtk.org/home/openmm.
November 19th, 2009
We report a parallel Monte Carlo algorithm accelerated by graphics processing units (GPU) for modeling time-resolved photon migration in arbitrary 3D turbid media. By taking advantage of the massively parallel threads and low-memory latency, this algorithm allows many photons to be simulated simultaneously in a GPU. To further improve the computational efficiency, we explored two parallel random number generators (RNG), including a floating-point-only RNG based on a chaotic lattice. An efficient scheme for boundary reflection was implemented, along with the functions for time-resolved imaging. For a homogeneous semi-infinite medium, good agreement was observed between the simulation output and the analytical solution from the diffusion theory. The code was implemented with CUDA programming language, and benchmarked under various parameters, such as thread number, selection of RNG and memory access pattern. With a low-cost graphics card, this algorithm has demonstrated an acceleration ratio above 300 when using 1792 parallel threads over conventional CPU computation. The acceleration ratio drops to 75 when using atomic operations. These results render the GPU-based Monte Carlo simulation a practical solution for data analysis in a wide range of diffuse optical imaging applications, such as human brain or small-animal imaging.
(Qianqian Fang and David A. Boas, “Monte Carlo Simulation of Photon Migration in 3D Turbid Media Accelerated by Graphics Processing Units,” Opt. Express, vol. 17, issue 22, pp. 20178-20190 (2009), doi:10.1364/OE.17.020178 , link to full-text PDF
A free software, Monte Carlo eXtreme (MCX), is also available at http://mcx.sourceforge.net.)
October 1st, 2009
In this work we describe a GPU implementation for an individual-based model for fish schooling. In this model each fish aligns its position and orientation with an appropriate average of its neighbors’ positions and orientations. This carries a very high computational cost in the so-called nearest neighbors search. By leveraging the GPU processing power and the new programming model called CUDA we implement an efficient framework which permits to simulate the collective motion of
high-density individual groups. In particular we present as a case study a simulation of motion of millions of fishes. We describe our implementation and present extensive experiments which
demonstrate the effectiveness of our GPU implementation.
(Ugo Erra, Bernardino Frola, Vittorio Scarano, Iain Couzin, An efficient GPU implementation for large scale individual-based simulation of collective behavior. Proceedings of High Performance Computational Systems Biology (HiBi09). October 14-16, 2009, Trento, Italy.
September 29th, 2009
General-purpose application development for GPUs (GPGPU) has recently gained momentum as a cost-effective approach for accelerating data-and compute-intensive applications. It has been driven by the introduction of C-based programming environments such as NVIDIA’s CUDA, OpenCL, and Intel’s Ct. While significant effort has been focused on developing and evaluating applications and software tools, comparatively little has been devoted to the analysis and characterization of applications to assist future work in compiler optimizations, application re-structuring, and micro-architecture design.
This paper proposes a set of metrics for GPU workloads and uses these metrics to analyze the behavior of GPU programs. We report on an analysis of over 50 kernels and applications including the full NVIDIA CUDA SDK and UIUC’s Parboil Benchmark Suite covering control flow, data flow, parallelism, and memory behavior. The analysis was performed using a full function emulator we developed that implements the NVIDIA virtual machine referred to as PTX (Parallel Thread eXecution architecture) – a machine model and low-level virtual ISA that is representative of ISAs for data-parallel execution. The emulator can execute compiled kernels from the CUDA compiler, currently supports the full PTX 1.4 specification, and has been validated against the full CUDA SDK. The results quantify the importance of optimizations such as those for branch re-convergence, the prevalance of sharing between threads, and highlights opportunities for additional parallelism.
(Andrew Kerr, Gregory Diamos, Sudhakar Yalamanchili, A Characterization and Analysis of PTX Kernels. International Symposium on Workload Characterization (IISWC). 2009.)
September 29th, 2009
If you can’t make it to NVIDIA’s inaugural GPU Technology Conference, taking place Sept. 30 to Oct. 2, 2009 in San Jose, CA, you can watch a live webcast here.
Links for the live webcast, event coverage complete with blogs, photos and video interviews, and more details around the conference, including conference schedule, session abstracts and speaker bios can be found at www.nvidia.com/gtc.
The schedule of live webcasts is as follows:
- Wed. Sept 30 – 1:00 PM to 2:30 PM: Opening Keynote with Jen-Hsun Huang, CEO and Co-Founder, NVIDIA
- Wed. Sept 30 – 3:00 PM to 4:15 PM: General Session on Important Trends in Visual Computing
- Wed. Sept 30 – 4:30 PM to 5:45 PM: General Session on Breakthroughs in High Performance Computing
- Thurs. Oct 1 – 9:00 AM to 10:30 AM: Day 2 Keynote with Hanspeter Pfister, Professor and Computing Visionary, Harvard University
- Fri. Oct 2 – 8:30 AM to 10:00 AM: Day 3 Keynote with Richard Kerris, CTO, Lucasfilm
September 28th, 2009
This article describes computational challenges involved in the study of biomolecular complexes, and relates some of the authors’ early experiences using GPUs to accelerate computationally demanding biomolecular modeling and simulation tasks. The article reviews a number of early successes in the application of GPUs to molecular modeling and touches on future challenges in this rapidly developing area of science and technology. The article is written to be readable by a fairly general audience. (
Probing Biomolecular Machines with Graphics Processors. James C. Phillips, John E. Stone. Communications of the ACM 52(10):34-41, 2009.)
September 22nd, 2009
OpenCurrent is an open source C++ library for solving Partial Differential Equations (PDEs) over regular grids using the CUDA platform from NVIDIA. It breaks down a PDE into 3 basic objects, “Grids”, “Solvers,” and “Equations.” “Grid” data structures efficiently implement regular 1D, 2D, and 3D arrays in both double and single precision. Grids support operations like computing linear combinations, managing host-device memory transfers, interpolating values at non-grid points, and performing array-wide reductions. “Solvers” use these data structures to calculate terms arising from discretizations of PDEs, such as finite-difference based advection and diffusion schemes, and a multigrid solver for Poisson equations. These computational building blocks can be assembled into complete “Equation” objects that solve time-dependent PDEs. One such Equation solver is an incompressible Navier-Stokes solver that uses a second-order Boussinesq model. This equation solver is fully validated, and has been used to study Rayleigh-Benard convection under a variety of different regimes. Benchmarks show it to perform about 8 times faster than an equivalent Fortran code running on an 8-core Xeon.
Read the rest of this entry »
September 22nd, 2009
nHD is a multi-GPU 2nd order full Godunov three-dimensional uniform-mesh Euler equations solver for calorically ideal, compressible gas. nHD uses CUDA C with MPI and runs on a cluster of multi-GPU machines to accelerate computational hydrodynamics calculations.
Full Godunov method solves the hydrodynamic equations by discretizing the fluid and calculating the nonlinear evolution of the discretized distribution, using the analytic solutions for Riemann problems. Thus full Godunov method can resolve arbitrary severe shockwaves with minimum artificial dissipation and oscillation, and is the irreplaceable method for simulations of compressible fluid where shockwaves and vacuums are naturally generated from fluid motions.
nHD is open source under a BSD-style license and is available, and comments are welcome at http://code.google.com/p/astro-attic/wiki/NHDIntroduction.
September 7th, 2009
The compute unified device architecture is an almost conventional programming approach for managing computations on a graphics processing unit (GPU) as a data-parallel computing device. With a maximum number of 240 cores in combination with a high memory bandwidth, a recent GPU offers resources for computational physics. We apply this technology to methods of fluctuation analysis, which includes determination of the scaling behavior of a stochastic process and the equilibrium autocorrelation function. Additionally, the recently introduced pattern formation conformity (Preis T et al 2008 Europhys. Lett. 82 68005), which quantifies pattern-based complex short-time correlations of a time series, is calculated on a GPU and analyzed in detail. Results are obtained up to 84 times faster than on a current central processing unit core. When we apply this method to high-frequency time series of the German BUND future, we find significant pattern-based correlations on short time scales. Furthermore, an anti-persistent behavior can be found on short time scales. Additionally, we compare the recent GPU generation, which provides a theoretical peak performance of up to roughly 1012 [ed. should be 1 Trillion] floating point operations per second with the previous one.
(Tobias Preis et al., Accelerated fluctuation analysis by graphic cards and complex pattern formation in financial markets, New J. Phys. 11 093024 (21pp) doi: 10.1088/1367-2630/11/9/093024)
OpenMM is an open-source library that enables molecular dynamics (MD) simulations to be accelerated on high performance computer architectures, such as GPUs. This latest release adds support for:
- A complete set of C and Fortran wrappers
- Energy computations on GPUs
- Ewald summation
- A faster algorithm for handling constraints
- And more!
Download the latest version of OpenMM from http://simtk.org/home/openmm.