Accelerate Your Science on the Titan Supercomputer

April 1st, 2012

Accelerate your science on the Titan Supercomputer later this year, by harnessing up to 20 petaflops of parallel processing using GPUs. Open to researchers from academia, government labs, and industry, the Innovative and Novel Computational Impact on Theory and Experiment (INCITE) program is the major means by which the scientific community gains access to some of the fastest supercomputers.

First, let INCITE know you are interested in GPU acceleration by completing a two-minute survey. Then determine if you want to submit a formal proposal by June 27, 2012.

Need help drafting your proposal? Attend a “how-to” webinar on Tuesday, April 24 to learn tips and tricks for drafting your proposal. For further questions about the call for proposals, please contact the INCITE manager at INCITE@DOEleadershipcomputing.org.

Adaptive Row-Grouped CSR Format For Storing of Sparse Matrices on GPU

April 1st, 2012

Abstract:

We present a new adaptive format for storing sparse matrices on GPU. We compare it with several other formats including CUSPARSE which is today probably the best choice for processing of sparse matrices on GPU in CUDA. Contrary to CUSPARSE which works with common CSR format, our new format requires conversion. However, multiplication of sparse-matrix and vector is significantly faster for many matrices. We demonstrate it on a set of 1600 matrices and we show for what types of matrices our format is profitable.

(Heller M., Oberhuber T., “Adaptive Row-Grouped CSR Format For Storing of Sparse Matrices on GPU“, preprint on Arxiv.org 2012, [PDF])

OpenCL Programming Webinar Series

March 30th, 2012

AMD offers an OpenCL Programming Webinar Series to help software developers become experts in the latest technologies, standards and best practices. The series of three OpenCL webinars will be presented by Rob Farber.

1. April 10th, 10AM PDT: Introducing Portable Parallelism

  • C and C++ APIs
  • OpenCL Memory Spaces
  • The OpenCL Execution Model

2. April 24th, 10AM PDT: Coordinating OpenCL Computations on one more Heterogeneous Devices

  • How to Concisley Utilize Multiple Command Queues and Coordinate Tasks Across Multiple Heterogeneous Devices such as two GPU + CPU
  • Code Sample Discussion: Massively Parallel Random Number Test Framework

3. May 1st, 10AM PDT: Accelerate Rendering by an Order of Magnitude with OpenCL, Plus a View to the Multi-core and Web-enabled Future

  • How to use OpenCL to Provide High-Quality, Fast Rendering in Combination with Primitive Restart
  • Device Fission, Partitioning Hardware Capabilities for Optimal Resource Usage
  • Looking to the Future – WebCL

Registration is limited. More Information: http://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspx

Call for Participation: Accelerating Computational Science Symposium 2012 (ACSS)

March 18th, 2012

You are cordially invited to attend the Accelerating Computational Science Symposium 2012 (ACSS). This symposium is designed to advance the understanding of hybrid-computing architectures and how they are accelerating progress in scientific research.

Hosted by Oak Ridge Leadership Computing Facility (OLCF), along with the National Center for Supercomputing Applications (NCSA) and the Swiss National Supercomputing Centre (CSCS), the symposium takes place March 29-30, 2012 in Washington DC.

The complete agenda and additional information about the symposium is available at http://www.olcf.ornl.gov/event/accelerating-computational-science-symposium-2012-acss-2012/.

Read the rest of this entry »

Image segmentation using CUDA implementations of the Runge-Kutta-Merson and GMRES methods

March 18th, 2012

Abstract:

Modern GPUs are well suited for performing image processing tasks. We utilize their high computational performance and memory bandwidth for image segmentation purposes. We segment cardiac MRI data by means of numerical solution of an anisotropic partial differential equation of the Allen-Cahn type. We implement two different algorithms for solving the equation on the CUDA architecture. One of them is based on the Runge-Kutta-Merson method for the approximation of solutions of ordinary differential equations, the other uses the GMRES method for the numerical solution of systems of linear equations. In our experiments, the CUDA implementations of both algorithms are about 3–9 times faster than corresponding 12-threaded OpenMP implementations.

(Oberhuber T., Suzuki A., Vacata J., Žabka V., “Image segmentation using CUDA implementations of the Runge-Kutta-Merson and GMRES methods“, Journal of Math-for-Industry, 2011, vol. 3, pp. 73–79 [PDF])

Wall Orientation and Shear Stress in the Lattice Boltzmann Model

March 16th, 2012

Abstract:

The wall shear stress is a quantity of profound importance for clinical diagnosis of artery diseases. The lattice Boltzmann is an easily parallelizable numerical method of solving the flow problems, but it suffers from errors of the velocity field near the boundaries which leads to errors in the wall shear stress and normal vectors computed from the velocity. In this work we present a simple formula to calculate the wall shear stress in the lattice Boltzmann model and propose to compute wall normals, which are necessary to compute the wall shear stress, by taking the weighted mean over boundary facets lying in a vicinity of a wall element. We carry out several tests and observe an increase of accuracy of computed normal vectors over other methods in two and three dimensions. Using the scheme we compute the wall shear stress in an inclined and bent channel fluid flow and show a minor influence of the normal on the numerical error, implying that that the main error arises due to a corrupted velocity field near the staircase boundary. Finally, we calculate the wall shear stress in the human abdominal aorta in steady conditions using our method and compare the results with a standard finite volume solver and experimental data available in the literature. Applications of our ideas in a simplified protocol for data preprocessing in medical applications are discussed.

(Maciej Matyka, Zbigniew Koza, Łukasz Mirosław: “Wall Orientation and Shear Stress in the Lattice Boltzmann Model”, Preprint, 2012. [arXiv])

Compressed Multiple-Row Storage Format

March 16th, 2012

Abstract:

A new format for storing sparse matrices is proposed for efficient sparse matrix-vector (SpMV) product calculation on modern throughput-oriented computer architectures. This format extends the standard compressed row storage (CRS) format and is easily convertible to and from it without any memory overhead. Computational performance of an SpMV kernel for the new format is determined for over 140 sparse matrices on two Fermi-class graphics processing units (GPUs) and the efficiency of the kernel, which peaks at 36 and 25 GFLOPS at single and double precision, respectively, is compared with that of five existing generic algorithms and industrial implementations. The efficiency of the new format is also measured as a function of the mean (mu) and of the standard deviation (sigma) of the number of matrix nonzero elements per row. The largest speedup is found for matrices with mu > 20 and mu > sigma > 1.5 and can be as high as 43%.

(Zbigniew Koza, Maciej Matyka, Sebastian Szkoda, Łukasz Mirosław: “Compressed Multiple-Row Storage Format”, Preprint, 2012. [arXiv])

New Row-grouped CSR format for storing the sparse matrices on GPU with implementation in CUDA

March 14th, 2012

Abstract:

A new format for storing sparse matrices is suggested. It is designed to perform well mainly on GPU devices. Its implementation in CUDA is presented. Its performance is tested on 1600 different types of matrices. This format is compared in detail with a hybrid format, and strong and weak points of both formats are shown.

(Oberhuber T., Suzuki A., Vacata J.: “New Row-grouped CSR format for storing the sparse matrices on GPU with implementation in CUDA”, Acta Technica 56: 447-466, 2011 [PDF])

CFP: UKPEW 2012 – 28th UK Performance Engineering Workshop

March 14th, 2012

UKPEW is the leading UK forum for the presentation of all aspects of performance modelling and analysis of computer and telecommunication systems. Original papers are invited on all relevant topics but papers on or related to the subjects listed below are particularly welcome.

Topics of interest include, but are not limited to:

Read the rest of this entry »

GPU accelerated Convex Hull Computation

March 12th, 2012

Abstract:

We present a hybrid algorithm to compute convex hull of points in three and higher dimensional spaces. Our formulation uses a GPU-based interior point filter to cull away many of the points that do not belong to the boundary. The convex hull of remaining points is computed on the CPU. The GPU-based filter proceeds in an incremental manner and computes a pseudo-hull that is contained inside the convex hull of the original points. The pseudo-hull computation involves only localized operations and therefore, maps well to GPU architectures. Furthermore, the underlying approach extends to high dimensional point sets and deforming points. In practice, our culling filter can reduce the number of candidate points by two orders of magnitude. We have implemented the hybrid algorithm on commodity GPUs, and evaluated its performance on several large point sets. In practice, the GPU-based filtering algorithm can cull up to 85M interior points per second on NVIDIA GeForce GTX 580 and the hybrid algorithm improves the overall performance of convex hull computation by 10-27 times (for static point sets) and 22-46 times (for deforming point sets).

(Min Tang, Jie-yi Zhao, Ruofeng Tong, and Dinesh Manocha: “GPU accelerated Convex Hull Computation”, accepted by SMI’2012. [WWW] [PREPRINT])

Page 22 of 108« First...10...2021222324...304050...Last »