The MicroCFD Virtual Wind Tunnel, Educational & Professional Edition, has recently been upgraded. The new version (1.8) supports multi-core CPUs and CUDA enabled GPUs and runs

significantly faster than the previous single-processor version. The results of a benchmark test on a system with an Intel quad-core CPU and an NVIDIA 96-core GPU show that an unsteady 2D or axis-symmetric compressible flow can now be run at a resolution of one million cells (Pro Edition) within a few minutes. A 3D version is currently under development and is expected to be released in 2014.

## MicroCFD now runs on CUDA enabled NVIDIA GPUs

October 11th, 2012## Accelerating CFD using OpenFOAM with GPUs

September 23rd, 2012The OpenFOAM CFD Toolbox is a free, open source CFD software package produced by OpenCFD Ltd. Its user base represents a wide range of engineering and science disciplines in both commercial and academic organizations. OpenFOAM has an extensive range of features to solve a wide range of fluid flows and physics phenomenon. OpenFOAM provides tools for all three stages of CFD, preprocessing, solvers, and post processing. Almost all are capable of being run in parallel as standard making it an important resource for a wide range of scientists and engineers using HPC for CFD.

General-purpose Graphics Processing Unit (GPU) technology is increasingly being used to accelerate compute-intensive HPC applications across various disciplines in the HPC community. OpenFOAM CFD simulations can take a significant amount of time and are computationally intensive. Comparing various alternatives for enabling faster research and discovery using CFD is of key importance. SpeedIT libraries from Vratis provide GPU-accelerated iterative solvers that replace the iterative solvers in OpenFOAM.

In order to investigate the GPU-acceleration of OpenFOAM, we simulate the three dimensional lid-driven cavity problem based on the tutorial provided with OpenFOAM. The 3D lid-driven cavity problem is an incompressible flow problem solved using OpenFOAM icoFoam solver. The majority of the computationally intensive portion of the solver is the pressure equation. In the case of acceleration, only the pressure calculation is offloaded to the GPUs. On the CPUs, the PCG solver with DIC preconditioner is used. In the GPU-accelerated case, the SpeedIT 2.1 algebraic multigrid precoditioner with smoothed aggregation (AMG) in combination with the SpeedIT Plugin to OpenFOAM is used.

## Webinar: Scaling Soft Matter Physics to a Thousand GPUs and Beyond

September 22nd, 2012The “Ludwig” lattice Boltzmann fluid dynamics application is a versatile application capable of simulating the hydrodynamics of complex fluids, (e.g. mixtures, surficants, liquid crystals, particle suspensions) to allow cutting-edge research into condensed matter physics. On October 3, Dr. Alan Gray from the University of Edinburgh presents a webinar on his team’s experiences in scaling the application on the Cray XK6 hybrid supercomputer. The presentation will cover:

- A review of excellent scaling up to O(1000) GPUs
- Steps taken to maximize performance on each GPU
- Designing the communication to allow efficient usage of many GPUs in parallel, including the overlapping of several stages using CUDA stream functionality
- Advanced functionality, including how to include colloidal particles in the simulation while minimizing data transfer overheads

Register at http://www.gputechconf.com/page/gtc-express-webinar.html.

## Accelerate OpenFOAM with SpeedIT 2.1

May 24th, 2012SpeedIT provides a set of accelerated solvers for sparse linear systems of equations. The library supports C/C++ and Fortran, and it can be used with OpenFOAM to accelerate CFD simulations. SpeedIT 2.1 contains two new preconditioners:

• Algebraic Multigrid with Smoothed Aggregation (AMG)

• Approximate Inverse (AINV)

OpenFOAM simulations on the GPU can be up to 3.5x faster compared to CG and DIC/DILU preconditioners on the CPU and up to 1.6x faster if you run GAMG.

See the SpeedIT website and blog for more details.

## Wall Orientation and Shear Stress in the Lattice Boltzmann Model

March 16th, 2012Abstract:

The wall shear stress is a quantity of profound importance for clinical diagnosis of artery diseases. The lattice Boltzmann is an easily parallelizable numerical method of solving the flow problems, but it suffers from errors of the velocity field near the boundaries which leads to errors in the wall shear stress and normal vectors computed from the velocity. In this work we present a simple formula to calculate the wall shear stress in the lattice Boltzmann model and propose to compute wall normals, which are necessary to compute the wall shear stress, by taking the weighted mean over boundary facets lying in a vicinity of a wall element. We carry out several tests and observe an increase of accuracy of computed normal vectors over other methods in two and three dimensions. Using the scheme we compute the wall shear stress in an inclined and bent channel fluid flow and show a minor influence of the normal on the numerical error, implying that that the main error arises due to a corrupted velocity field near the staircase boundary. Finally, we calculate the wall shear stress in the human abdominal aorta in steady conditions using our method and compare the results with a standard finite volume solver and experimental data available in the literature. Applications of our ideas in a simplified protocol for data preprocessing in medical applications are discussed.

(Maciej Matyka, Zbigniew Koza, Łukasz Mirosław: *“Wall Orientation and Shear Stress in the Lattice Boltzmann Model”*, Preprint, 2012. [arXiv])

## The CUDA implementation of the method of lines for the curvature dependent flows

March 12th, 2012Abstract:

We study the use of a GPU for the numerical approximation of the curvature dependent flows of graphs – the mean-curvature flow and the Willmore flow. Both problems are often applied in image processing where fast solvers are required. We approximate these problems using the complementary finite volume method combined with the method of lines. We obtain a system of ordinary differential equations which we solve by the Runge–Kutta–Merson solver. It is a robust solver with an automatic choice of the integration time step. We implement this solver on CPU but also on GPU using the CUDA toolkit. We demonstrate that the mean-curvature flow can be successfully approximated in single precision arithmetic with the speed-up almost 17 on the Nvidia GeForce GTX 280 card compared to Intel Core 2 Quad CPU. On the same card, we obtain the speed-up 7 in double precision arithmetic which is necessary for the fourth order problem – the Willmore flow of graphs. Both speed-ups were achieved without affecting the accuracy of the approximation. The article is structured in such way that the reader interested only in the implementation of the Runge–Kutta–Merson solver on the GPU can skip the sections containing the mathematical formulation of the problems.

(Oberhuber T., Suzuki A., Žabka V.: *“The CUDA implementation of the method of lines for the curvature dependent flows”*, Kybernetika 47(2):251–272, 2011. [PDF])

## SpeedIT 2.0 released

February 24th, 2012SpeedIT 2.0 and the SpeedIT plugin to OpenFOAM have been released. New features include:

- One of the fastest Sparse Matrix Vector Multiplication worldwide.
- Faster Conjugate Gradient and BiConjugate Gradient solvers.
- State-of-the-art CMRS format for storing sparse matrices. The format requires less memory than CRS or HYB (from CUSPARSE and CUSP).
- Faster acceleration in OpenFOAM (Computational Fluid Dynamics).

More information is available at http://speed-it.vratis.com.

## GPU and APU computations of Finite Time Lyapunov Exponent fields

February 1st, 2012We present GPU and APU accelerated computations of Finite-Time Lyapunov Exponent (FTLE) fields. The calculation of FTLEs is a computationally intensive process, as in order to obtain the sharp ridges associated with the Lagrangian Coherent Structures an extensive resampling of the flow field is required. The computational performance of this resampling is limited by the memory bandwidth of the underlying computer architecture. The present technique harnesses data-parallel execution of many-core architectures and relies on fast and accurate evaluations of moment conserving functions for the mesh to particle interpolations. We demonstrate how the computation of FTLEs can be efficiently performed on a GPU and on an APU through OpenCL and we report over one order of magnitude improvements over multi-threaded executions in FTLE computations of bluff body flows. (Conti C., Rossinelli D., Koumoutsakos P., *GPU and APU computations of Finite Time Lyapunov Exponent fields*, Journal of Computational Physics, 231(5):2229–2244, 2012.

## Symscape Releases Caedium v3.0 with GPU Support

October 20th, 2011The latest release of Symscape’s Caedium (v3.0) now has support for CFD simulations using NVIDIA CUDA GPU devices on Windows and Linux. Caedium is an integrated simulation environment that targets Computational Fluid Dynamics (CFD). The GPU support is provided by Symscape’s ofgpu linear solver library for OpenFOAM®. For more details see:

http://www.symscape.com/news/hybrid-cfd-modeling-cloud-computing

## GPU Linear Solvers for OpenFOAM

May 4th, 2011ofgpu is a free GPL library from Symscape that provides GPU linear solvers for OpenFOAM®. The experimental library targets NVIDIA CUDA devices on Windows, Linux, and (untested) Mac OS X. It uses the Cusp library’s Krylov solvers to produce equivalent GPU (CUDA-based) versions of the standard OpenFOAM linear solvers:

- PCG – Preconditioned conjugate gradient solver for symmetric matrices (e.g., p)
- PBiCG – Preconditioned biconjugate gradient solver for asymmetric matrices (e.g., Ux, k)

ofgpu also has support for the OpenFOAM preconditioners:

- no
- diagonal

For more details see “GPU Linear Solver Library for OpenFOAM”. OpenFOAM is a registered trademark of OpenCFD and is unaffiliated with Symscape.