A GPU-based Streaming Algorithm for High-Resolution Cloth Simulation

October 19th, 2013


We present a GPU-based streaming algorithm to perform high-resolution and accurate cloth simulation. We map all the components of cloth simulation pipeline, including time integration, collision detection, collision response, and velocity updating to GPU-based kernels and data structures. Our algorithm perform intra-object and interobject collisions, handles contacts and friction, and is able to accurately simulate folds and wrinkles. We describe the streaming pipeline and address many issues in terms of obtaining high throughput on many-core GPUs. In practice, our algorithm can perform high-fidelity simulation on a cloth mesh with 2M triangles using 3GB of GPU memory. We highlight the parallel performance of our algorithm on three different generations of GPUs. On a high-end NVIDIA Tesla K20c, we observe up to two orders of magnitude performance improvement as compared to a single-threaded CPU-based algorithm, and about one order of magnitude improvement over a 16-core CPUbased parallel implementation.

(Min Tang, Roufeng Tong, Rahul Narain, Chang Meng and Dinesh Manocha: “A GPU-based Streaming Algorithm for High-Resolution Cloth Simulation”, in the Proceedings of Pacific Graphics 2013. [WWW])

Workshop: Programming of Heterogeneous Systems in Physics, Oct 5-7, Jena

June 26th, 2011

We are pleased to announce a three-day workshop on “Programming of Heterogeneous Systems in Physics”, a workshop to be held on 5-7 October 2011 at Friedrich-Schiller University, Jena, Germany. This workshop will focus on:

  • Solving partial differential equations efficiently on the heterogeneous computing systems. There is some emphasis on GPU computing, but other accelerators and the efficient use of large multi-core cluster nodes are considered as well.
  • Optimization of computational kernels coming from finite differences, spectral methods, and lattice gauge theory on accelerators.
  • We plan to have a tutorial day, two days of talks and a poster session. We plan for discussion and talks to provide an overview of current work in these areas, and to develop future lines of research and collaborations. The deadline for submission of talks is 15 August 2011.

Please visit http://wwwsfb.tpi.uni-jena.de/Events/Event-PHSP11.shtml for more information. This workshop is organised by G. Zumbusch (Chair, Jena), B. Bruegmann (Jena), A. Weyhausen (Jena), L. Rezzolla (Potsdam) and B. Zink (Tuebingen).


GPIUTMD 0.9.6 released

June 26th, 2011

GPIUTMD stands for Graphic Processors at Isfahan University of Technology for Many-particle Dynamics. It performs general-purpose many-particle dynamic simulations on a single workstation, taking advantage of NVIDIA GPUs to attain a level of performance equivalent to thousands of cores on a fast cluster. Flexible and configurable, GPIUTMD is currently being used for all atom and coarse-grained molecular dynamics simulations of nano-materials, glasses, and surfactants; dissipative particle dynamics simulations (DPD) of polymers; and crystallization of metals using EAM potentials. GPIUTMD 0.9.6 adds many new features. Highlights include:

  • Morse bond potential
  • Adding constant acceleration to a group of particles. (useful for modeling gravity effects)
  • Computes the full virial stress tensor (useful in mechanical characterization of materials)
  • Long-ranged electrostatics via PPPM
  • Support for CUDA 3.2
  • Theory manual
  • Up to twenty percent boost in simulations
  • and more

A demo version of GPIUTMD 0.9.6 will be available soon for download under an open source license. Check out the quick start tutorial to get started, or check out the full documentation to see everything it can do.


Mesh-particle interpolations on GPUs and multicore CPUs

May 4th, 2011


Particle–mesh interpolations are fundamental operations for particle-in-cell codes, as implemented in vortex methods, plasma dynamics and electrostatics simulations. In these simulations, the mesh is used to solve the field equations and the gradients of the fields are used in order to advance the particles. The time integration of particle trajectories is performed through an extensive resampling of the flow field at the particle locations. The computational performance of this resampling turns out to be limited by the memory bandwidth of the underlying computer architecture. We investigate how mesh–particle interpolation can be efficiently performed on graphics processing units (GPUs) and multicore central processing units (CPUs), and we present two implementation techniques. The single-precision results for the multicore CPU implementation show an acceleration of 45–70×, depending on system size, and an acceleration of 85–155× for the GPU implementation over an efficient single-threaded C++ implementation. In double precision, we observe a performance improvement of 30–40× for the multicore CPU implementation and 20–45× for the GPU implementation. With respect to the 16-threaded standard C++ implementation, the present CPU technique leads to a performance increase of roughly 2.8–3.7× in single precision and 1.7–2.4× in double precision, whereas the GPU technique leads to an improvement of 9× in single precision and 2.2–2.8× in double precision.

(Diego Rossinelli, Christian Conti and Petros Koumoutsakos: “Mesh−particle interpolations on GPUs and multicore CPUs”, Phil. Trans. R. Soc. A 2011, 369:2164-2175 [doi])

GPU Linear Solvers for OpenFOAM

May 4th, 2011

ofgpu is a free GPL library from Symscape that provides GPU linear solvers for OpenFOAM®. The experimental library targets NVIDIA CUDA devices on Windows, Linux, and (untested) Mac OS X. It uses the Cusp library’s Krylov solvers to produce equivalent GPU (CUDA-based) versions of the standard OpenFOAM linear solvers:

  • PCG – Preconditioned conjugate gradient solver for symmetric matrices (e.g., p)
  • PBiCG – Preconditioned biconjugate gradient solver for asymmetric matrices (e.g., Ux, k)

ofgpu also has support for the OpenFOAM preconditioners:

  • no
  • diagonal

For more details see “GPU Linear Solver Library for OpenFOAM”. OpenFOAM is a registered trademark of OpenCFD and is unaffiliated with Symscape.

A memory efficient and fast sparse matrix vector product on a GPU

May 4th, 2011


This paper proposes a new sparse matrix storage format which allows an efficient implementation of a sparse matrix vector product on a Fermi Graphics Processing Unit (GPU). Unlike previous formats it has both low memory footprint and good throughput. The new format, which we call Sliced ELLR-T has been designed specifically for accelerating the iterative solution of a large sparse and complex-valued system of linear equations arising in computational electromagnetics. Numerical tests have shown that the performance of the new implementation reaches 69 GFLOPS in complex single precision arithmetic. Compared to the optimized six core Central Processing Unit (CPU) (Intel Xeon 5680) this performance implies a speedup by a factor of six. In terms of speed the new format is as fast as the best format published so far and at the same time it does not introduce redundant zero elements which have to be stored to ensure fast memory access. Compared to previously published solutions, significantly larger problems can be handled using low cost commodity GPUs with limited amount of on-board memory.

(A. Dziekonski, A. Lamecki, and M. Mrozowski: “A memory efficient and fast sparse matrix vector product on a GPU“, Progress In Electromagnetics Research, Vol. 116, 49-63, 2011. [PDF])

SpeedIT 1.2 released

February 1st, 2011

SpeedIT Extreme 1.2 introduces support for complex numbers in single and double precision for all SpeedIT methods, such as fast sparse matrix vector multiplication, CG and BiCGSTAB solver.

OpenFOAM SpeedIT plugin 1.1 released

November 27th, 2010

The OpenFOAM SpeedIT plugin version 1.1 has been released under the GPL License. The most important new features are:

  • Multi-GPU support
  • Tested on Fermi architecture (GTX460 and Tesla C2050)
  • Automated submission of the domain to the GPU cards (using decomposePar from OpenFOAM)
  • Optimized submission of computational tasks to the best GPU card in the system for any number of computational threads
  • Plugin picks the most powerful GPU card for a single thread cases

The OpenFOAM SpeedIT plugin is available at http://speedit.vratis.com.

ACUSIM Software Releases Latest Version of AcuSolve CFD Solver

October 27th, 2010
ACUSim vortex shedding

ACUSim vortex shedding

From a recent press release:

ACUSIM Software, Inc., a leader in computational fluid dynamics (CFD) technology and solutions, today announced the immediate availability of AcuSolve™ 1.8, the latest version of ACUSIM’s leading general-purpose, finite-element based CFD solver. ACUSIM will demonstrate AcuSolve 1.8 during two free webinars, taking place at 9:30 a.m. – 10:30 a.m. ET and 6:30 p.m. – 7:30 p.m. ET, on Oct. 26, 2010, at http://www.acusim.com/html/events.html.

Used by designers and research engineers with all levels of expertise, AcuSolve is highly differentiated by its accelerated speed, robustness, accuracy and multiphysics/multidisciplinary capabilities. Contributing to its robustness is the product’s Galerkin/Least-Square (GLS) finite element formulation and novel iterative linear equation solver for the fully coupled equation system. The combination of these two powerful technologies provides a highly stable and efficient solver, capable of handling unstructured meshes with tight boundary layers automatically generated from complex industrial geometries. Read the rest of this entry »

IMPETUS Afea Solver: A novel Finite Element code adapted to GPU technology

October 16th, 2010

IMPETUS Afea is proud to announce the launch of IMPETUS Afea Solver (version 1.0).

The IMPETUS Afea Solver is a non-linear explicit finite element tool. It is developed to predict large deformations of structures and components exposed to extreme loading conditions. The tool is applicable to transient dynamics and quasi-static loading conditions. The primary focus of the IMPETUS Afea Solver is accuracy, robustness and simplicity for the user. The number of purely numerical parameters that the user has to provide as input is kept at a minimum. The IMPETUS Afea Solver is adapted to GPU technology; utilizing the computational force of a potent graphics card can considerably speed up your calculations.

IMPETUS Afea Solver Video on YouTube

For more information or requests please contact sales@impetus-afea.com