Multicore/Multi-GPU Accelerated Simulations of Multiphase Compressible Flows Using Wavelet Adapted Grids

March 29th, 2011


We present a computational method of coupling average interpolating wavelets with high-order finite volume schemes and its implementation on heterogeneous computer architectures for the simulation of multiphase compressible flows. The method is implemented to take advantage of the parallel computing capabilities of emerging heterogeneous multicore/multi-GPU architectures. A highly efficient parallel implementation is achieved by introducing the concept of wavelet blocks, exploiting the task-based parallelism for CPU cores, and by managing asynchronously an array of GPUs by means of OpenCL. We investigate the comparative accuracy of the GPU and CPU based simulations and analyze their discrepancy for two-dimensional simulations of shock-bubble interaction and Richtmeyer–Meshkov instability. The results indicate that the accuracy of the GPU/CPU heterogeneous solver is competitive with the one that uses exclusively the CPU cores. We report the performance improvements by employing up to 12 cores and 6 GPUs compared to the single-core execution. For the simulation of the shock-bubble interaction at Mach 3 with two million grid points, we observe a 100-fold speedup for the heterogeneous part and an overall speedup of 34.

(Rossinelli D., Hejazialhosseini B., Spampinato D., Koumoutsakos P.: “Multicore/Multi-GPU Accelerated Simulations of Multiphase Compressible Flows Using Wavelet Adapted Grids”, SIAM Journal of Scientific Computing 33:512-540, 2011 [DOI])

GTC 2011 Call for submissions

March 22nd, 2011

The call for Submissions for GPU Technology Conference 2011 (GTC), October 11-14, is now open.  You can find more details and instructions for submitting here.

Expanding the already comprehensive breadth of topics covered at GTC 2010, the GTC Content Committee has added new topic areas for 2011. Below is a partial list; see the GTC website for full details:

  • Application Design & Porting Techniques
  • Bioinformatics
  • Climate & Weather Modeling
  • Cluster Management
  • Computational Structural Mechanics
  • Parallel Programming Languages
  • Supercomputing

GTC is also looking for posters that describe novel or interesting research topics in parallel computing, visual computing, and applications of GPUs, with a particular interest in submissions describing GPU computing and CUDA applications that solve diverse problems in scientific and engineering domains. Read the rest of this entry »

CUDAfy – GPGPU completely in .NET

March 21st, 2011

From a recent press release:

CUDAfy is a .NET SDK that allows you to write, debug and emulate CUDA GPU applications in any .NET language including C# or Visual Basic. The aim is to bring the power of GPGPU to the large number of .NET developers out there. Features include:

  • .NET object orientated CUDA model (GThread)
  • Write .NET code marking methods, structures and constants that should be translated to CUDA (“Cudafying”)
  • An add-in for Red Gate’s .NET Reflector tool that translates to CUDA C
  • Built in emulation of GPU kernel functions
  • 1D, 2D and 3D array support including access to Array class’s Length, GetLength and Rank members
  • Use all standard .NET value types. No new types even for managing data allocated on GPU
  • Simple .NET wrapper for CUFFT and CUBLAS

During our work with the European Space Agency, Astrium and NLR we saw how GPUs could significantly improve performance of the emulation of algorithms targeted on FPGAs and ASICs. The SDEs and SDKs produced were .NET based and CUDAfy is the result of efforts to more tightly integrate the GPU and CPU code development. There are user guides and sample projects. Many of the samples in the book CUDA by Example have been ported to .NET. See for downloads and more information.

OpenCL Studio 1.0 has been released

March 21st, 2011

OpenCL Studio combines OpenCL and OpenGL into a single integrated development environment for high performance computing. The feature rich editor, interactive scripting language and extensible plug-in architecture support the rapid development of complex parallel algorithms and accompanying visualization. The first production version of OpenCL Studio including instructional videos and demo applications are available at

Jacket v1.7 for Faster MATLAB® Code

March 21st, 2011

AccelerEyes has released version 1.7 of Jacket for GPU computing with MATLAB®. Version 1.7 delivers even more speed to MATLAB with a new Sparse Linear Algebra Library, a new Signal Processing Library, a big boost to convolution functions, and more.

Jacket is the premier GPU software plugin for MATLAB. It enables rapid prototyping and problem solving across a range of government, manufacturing, energy, media, biomedical, financial, and scientific research applications. Jacket accelerates performance of common arithmetic and linear algebra functionality using the complete line of CUDA-capable GPUs from NVIDIA, including top of the line Tesla GPUs as well as Quadro visualization GPUs and GeForce gaming GPUs.

Some of the new features available with Jacket 1.7 include:

Read the rest of this entry »

Exact string matching algorithms in CUDA

March 21st, 2011

Exact String Matching algorithms are heavily used in a lot of applications like antivirus engines, DNA sequencing, text editors etc. This project provides CUDA implementations of  the naive, horspool and quicksearch algorithms, including a performance comparison against CPU versions:

CUDA 4.0 Release Aims to Make Parallel Programming Easier

March 1st, 2011

Today NVIDIA announced the upcoming 4.0 release of CUDA.  While most of the major CUDA releases accompanied a new GPU architecture, 4.0 is a software-only release, but that doesn’t mean there aren’t a lot of new features.  With this release, NVIDIA is aiming to lower the barrier to entry to parallel programming on GPUs, with new features including easier multi-GPU programming, a unified virtual memory address space, the powerful Thrust C++ template library, and automatic performance analysis in the Visual Profiler tool.  Full details follow in the quoted press release below.

Read the rest of this entry »

PFAC: A library for string matching on NVIDIA GPUs

February 28th, 2011

PFAC, the Parallel Failureless Aho-Corasick algorithm is a variant of the well-known Aho-Corasick (AC) algorithm with all failure transitions removed. The purpose of PFAC is to match all longest patterns in a given input stream against patterns pre-defined by users. The data-parallel nature of PFAC makes it perform well on GPUs, especially NVIDIA Fermi-based GPUs. The PFAC library, implemented in CUDA, provides a C level API that is easy to use. Users need not know CUDA programming. The user guide provides simple example to make it easy to use PFAC for content searches or virus detection on the GPU.

The PFAC library does not use multiple GPUs intrinsically but users can combine PFAC library with OpenMP or PThreads libraries to perform string matching on Multiple GPUs. The PFAC release includes OpenMP and PThreads examples. Download and further information:

New GPGPU meetup Groups: NYC, Boston, Chicago, Tokyo and More

February 28th, 2011

Following in the footsteps of the highly successful GPU Users meetup groups in Brisbane, Sydney, Perth and Melbourne, Australia, new GPU meetup groups are popping up around the USA and other countries. Professional “meetup” groups have now formed in New York City, Silicon Valley, BostonChicago, Albuquerque and Tokyo, bringing practitioners together to discuss the applications, methods, and technical challenges of using GPUs for algorithm acceleration. The events are free to attend. More information can be found at

Check out our User Groups page for more.

HIPHAC’11 Proceedings Available

February 20th, 2011

Proceedings from the 2nd International Workshop on High Performance and Hardware-Aware Computing (HIPHAC 2011) are now available from KIT Scientific Publishing. Individual copies can be ordered here, and the electronic proceedings are available free of charge.

Page 41 of 112« First...102030...3940414243...506070...Last »