CUDPP release 2.0 is a major new release of the CUDA Data-Parallel Primitives Library, with exciting new features. The public interface has undergone a minor redesign to provide thread safety. Parallel reductions (cudppReduce) and a tridiagonal system solver (cudppTridiagonal) have been added, and a new component library, cudpp_hash, provides fast data-parallel hash table functionality. In addition, support for 64-bit data types (double as well as long long and unsigned long long) has been added to all CUDPP algorithms, and a variety of bugs have been fixed. For a complete list of changes, see the change log. CUDPP 2.0 is available for download now.
A special session on the use of heterogeneous computing for water resources will be held as part of The XIX International Conference on Computational Methods in Water Resources, July 17-21 2012 at the University of Illinois at Urbana-Champaign. Submissions are due October 1st. Topics include, but are not limited to
- novel applications of heterogeneous computing resources,
- computational efficiency and performance assessment, and
- accuracy, verification and validation.
This session is focused on the use of heterogeneous computing resources (i.e. the combination of multi-core CPUs and many-core GPUs) for water resources. Over the last ten years, the use of GPUs for computation has gone from academic proof-of-concepts to industrially viable applications, showing speed-ups of 5-50 times over traditional approaches. Speed is of the utmost importance for many applications in water resources, making the use of heterogeneous computing attractive. In this session, we seek presentations of the state-of-the-art of heterogeneous computing for applications in water resources. Topics of interest include, but are not limited to, novel applications of heterogeneous computing resources; computational efficiency and performance assessment; and accuracy, and verification and validation.
Odeint is a high level C++ library for solving ordinary differential equations. It is released under an open-source license and supports a variety of different methods for solving ODEs. As a special feature it supports different algebras which perform the basic mathematical operations. This allows the user to solve ordinary differential equations on modern graphic cards. A Thrust interface is implemented, so that the power of CUDA can easily be employed. Furthermore, arbitrary precision types can easily be supported. Read the rest of this entry »
EvoPar 2012 (EvoPAR 2012, Malaga, Spain, 11-13th April 2012) will gather scientists, engineers and practitioners to share and exchange their experiences, discuss challenges and report state-of-the-art and in-progress research on all aspects of the application of evolutionary algorithms for improving parallel architectures and distributed computing infrastructures and implementation of parallel and distributed evolutionary algorithms.
Submissions are invited (by Nov. 30) on (but not limited to) the following topics:
- Optimization of parallel architectures by means of Evolutionary Algorithms.
- Hardware implementation of EAs, including Field Programmable Gate Arrays (FPGA), GPU, games consoles, mobile devices.
- GPGPU optimisation (CUDA, AMD, ARM, OpenCL, etc., etc.). Read the rest of this entry »
Implementing flexible software solutions, such as rendering and ray tracing, is still challenging for GPU programs. The amount of available memory on modern GPUs is relatively small. Scenes for feature film rendering and visualization have large geometric complexity and can easily contain millions of polygons and a large number of texture maps and other data attributes. CentiLeo presents an interactive out-of-core ray tracing engine running on the single desktop GPU. The system is built around a virtual memory manager. A novel ray intersection algorithm built around an acceleration structure, cached on the GPU, loads needed data on-demand using page swapping. The ray tracing engine is used to implement a variety of rendering and light transport algorithms. The system is implemented using CUDA and runs on a single NVIDIA GTX 480.
Algebraic multigrid methods for large, sparse linear systems are a necessity in many computational simulations, yet parallel algorithms for such solvers are generally decomposed into coarse-grained tasks suitable for distributed computers with traditional processing cores. However, accelerating multigrid on massively parallel throughput-oriented processors, such as the GPU, demands algorithms with abundant fine-grained parallelism. In this paper, we develop a parallel algebraic multigrid method which exposes substantial fine-grained parallelism in both the construction of the multigrid hierarchy as well as the cycling or solve stage. Our algorithms are expressed in terms of scalable parallel primitives that are efficiently implemented on the GPU. The resulting solver achieves an average speedup of over 2x in the setup phase and around 6x in the cycling phase when compared to a representative CPU implementation.
(Nathan Bell, Steven Dalton and Luke Olson: “Exposing Fine-Grained Parallelism in Algebraic Multigrid Methods”, NVIDIA Technical Report NVR-2011-002, June 2011 [PDF and Sources])
Parallel Smoothers for Matrix-based Multigrid Methods on Unstructured Meshes Using Multicore CPUs and GPUsJuly 29th, 2011
Multigrid methods are efficient and fast solvers for problems typically modeled by partial differential equations of elliptic type. For problems with complex geometries and local singularities stencil-type discrete operators on equidistant Cartesian grids need to be replaced by more flexible concepts for unstructured meshes in order to properly resolve all problem-inherent specifics and for maintaining a moderate number of unknowns. However, flexibility in the meshes goes along with severe drawbacks with respect to parallel execution – especially with respect to the definition of adequate smoothers. This point becomes in particular pronounced in the framework of fine-grained parallelism on GPUs with hundreds of execution units. We use the approach of matrix-based multigrid that has high flexibility and adapts well to the exigences of modern computing platforms.
In this work we investigate multi-colored Gauss-Seidel type smoothers, the power(q)-pattern enhanced multi-colored ILU(p) smoothers with fill-ins, and factorized sparse approximate inverse (FSAI) smoothers. These approaches provide efficient smoothers with a high degree of parallelism. In combination with matrix-based multigrid methods on unstructured meshes our smoothers provide powerful solvers that are applicable across a wide range of parallel computing platforms and almost arbitrary geometries. We describe the configuration of our smoothers in the context of the portable lmpLAtoolbox and the HiFlow3 parallel finite element package. In our approach, a single source code can be used across diverse platforms including multicore CPUs and GPUs. Highly optimized implementations are hidden behind a unified user interface. Efficiency and scalability of our multigrid solvers are demonstrated by means of a comprehensive performance analysis on multicore CPUs and GPUs.
V. Heuveline, D. Lukarski, N. Trost and J.-P. Weiss. Parallel Smoothers for Matrix-based Multigrid Methods on Unstructured Meshes Using Multicore CPUs and GPUs. EMCL Preprint Series No. 9. 2011.
TidePowerd has released Version 2 of their GPU computing solution for the .NET framework, GPU.NET. Their platform allows developers to quickly and easily write GPU-accelerated applications completely in .NET-based languages. Some key benefits include:
- Stay in C# and treat kernel methods like any regular method
- “Boilerplate” GPU programming tasks such as memory transfer and GPU scheduling are abstracted from the developer
- Cross-platform and cross-hardware with a single binary
- Systems seamlessly adapt to new hardware without rewriting code
- Speed on par with native code
New version 2 features:
- Visual Studio Error list and IntelliSense integration
- On-device random number generation
- Double precision support
Jacket 1.8 and LibJacket 1.1 have been released by Accelereyes, enabling GPU support for MATLAB and easier CUDA development with C/C++/Fortran and Python. New features include:
- Expanded support for the Signal Processing, Image Processing, and Statistics Libraries included with both Jacket and LibJacket
- Faster linear algebra for special systems (e.g. symmetric, positive definite, triangular, etc.)
- Enhanced visualizations
- New and updated examples: FDTD, Mandelbrot fractals, maximum-likelihood neural segmentation, MDS for genomics
- Built with CUDA 4.0 for peak performance
Visit http://www.accelereyes.com/ for details, downloads, whitepapers and tutorials.
TunaCode is pleased to announce the release of CUVI (CUDA Vision and Imaging Library) version 0.5 which comes with a new API and new features. This release makes it even simpler to add acceleration to existing Imaging applications, without any prior technical knowledge of GPUs. CUVI v0.5 is built from bottom up with performance and ease-of-use in mind.
CUVI version 0.5 is available for download at http://cuvilib.com and is available for Windows (Win32, x64) with planned support for Linux and Mac.