December 8th, 2009
Abstract:
This paper presents, to the author’s knowledge, the first graphics processing unit (GPU) accelerated program that solves the evolution of interacting scalar fields in an expanding universe. We present the implementation in NVIDIA’s Compute Unified Device Architecture (CUDA) and compare the performance to other similar programs in chaotic inflation models. We report speedups between one and two orders of magnitude depending on the used hardware and software while achieving small errors in single precision. Simulations that used to last roughly one day to compute can now be done in hours and this difference is expected to increase in the future. The program has been written in the spirit of LATTICEEASY and users of the aforementioned program should find it relatively easy to start using CUDAEASY in lattice simulations. The program is available under the GNU General Public License.
The program is freely available at http://www.physics.utu.fi/theory/particlecosmology/cudaeasy/
(Jani Sainio. “CUDAEASY – a GPU Accelerated Cosmological Lattice Program”. submitted to Computer Physics Communications (under review). November 2009.)
Posted in Research | Tags: Astrophysics, Cosmology, NVIDIA CUDA, Open Source, Papers | Write a comment
December 2nd, 2009
High throughput architectures for HPC seem likely to emphasize many cores with deep multithreading, wide SIMD, and sophisticated memory hierarchies. GPUs present one example, and their high throughput has led a number of researchers to port computationally intensive applications to NVIDIA’s CUDA architecture.
This session explored the art of performance tuning for CUDA using several case studies. Topics included profiling to identify bottlenecks, effective use of the GPU’s memory hierarchy and DRAM interface to maximize bandwidth, data versus task parallelism, and avoiding SIMD divergence. Many of the lessons learned in the context of CUDA are likely to apply to other many-core architectures used in HPC applications.
Posted in Events | Tags: Birds-of-a-Feather, Conferences, NVIDIA CUDA, Supercomputing | Write a comment
November 30th, 2009
The presentation slides from the Supercomputing 2009 full-day tutorial “High-Performance Computing with CUDA” are now available at http://gpgpu.org/sc2009.
Abstract:
NVIDIA’s CUDA is a general-purpose architecture for writing highly parallel applications. CUDA provides several key abstractions—a hierarchy of thread blocks, shared memory, and barrier synchronization—for scalable high-performance parallel computing. Scientists throughout industry and academia use CUDA to achieve dramatic speedups on production and research codes. The CUDA architecture supports many languages, programming environments, and libraries including C, Fortran, OpenCL, DirectX Compute, Python, Matlab, FFT, LAPACK, etc.
In this tutorial NVIDIA engineers will partner with academic and industrial researchers to present CUDA and discuss its advanced use for science and engineering domains. The morning session will introduce CUDA programming, motivate its use with many brief examples from different HPC domains, and discuss tools and programming environments. The afternoon will discuss advanced issues such as optimization and sophisticated algorithms/data structures, closing with real-world case studies from domain scientists using CUDA for computational biophysics, fluid dynamics, seismic imaging, and theoretical physics.
Posted in Developer Resources, Events | Tags: Conferences, High-Performance Computing, NVIDIA CUDA, Supercomputing, Tutorials & Courses | Write a comment
November 30th, 2009
HPMC is a small OpenGL/C/C++-library that extracts iso-surfaces of volumetric data directly on the GPU.
The library analyzes a lattice of scalar values describing a scalar field that is either stored in a Texture3D or can be accessed through an application-provided snippet of shader code. The output is a sequence of vertex positions and normals that form a triangulation of the iso-surface. HPMC provides traversal code to be included in an application vertex shader, which allows direct extraction in the vertex shader. Using the OpenGL transform feedback mechanism, the triangulation can be stored directly into a buffer object.
(C. Dyken, G. Ziegler, C. Theobalt, H.-P. Seidel, High-speed Marching Cubes using Histogram Pyramids, Computer Graphics Forum 27 (8), 2008.)
Posted in Developer Resources, Research | Tags: Open Source, OpenGL, Volume Rendering, Volumetric Reconstruction | Write a comment
November 30th, 2009
MTGP is a new variant of the Mersenne Twister (MT) pseudorandom number generator introduced by Mutsuo Saito and Makoto Matsumoto in 2009. MTGP is designed to take advantage of some features of GPUs, such as parallel execution and hi-speed constant reference. It supports 32-bit and 64-bit integers, as well as single and double precision floating point as output.
MTGP v1.0 is available now.
Posted in Developer Resources, Research | Tags: Open Source, Random Number Generation | 1 Comment
November 30th, 2009
24th International Conference on Supercomputing (ICS’10)
June 1-4, 2010
Epochal Tsukuba (Tsukuba International Congress Center)
Tsukuba, Japan
Sponsored by ACM/SIGARCH
ICS is the premier international forum for the presentation of research results in high-performance computing systems. In 2010 the conference will be held at the Epochal Tsukuba (Tsukuba International Congress Center) in Tsukuba City, the largest high-tech and academic
city in Japan.
Papers are solicited on all aspects of research, development, and application of high-performance experimental and commercial systems. Special emphasis will be given to work that leads to better understanding of the implications of the new era of million-scale parallelism and Exa-scale performance; including (but not limited to): Read the rest of this entry »
Posted in Events, Research | Tags: Conferences, High-Performance Computing, Supercomputing | Write a comment
November 25th, 2009
ECCOMAS CFD 2010, one of the world’s most important conferences in the field of CFD, is proud to announce a mini-symposium on “GPU Computing in Computational Fluid Dynamics”, organised by Stefan Turek and Dominik Göddeke.
Contributions to this event are cordially invited and should include a tentative title and an extended abstract. Submissions are due no later than December 15 (via email to stefan.turek (at) math.tu-dortmund.de). For details, please contact Stefan Turek or Dominik Göddeke.
Support of this mini-symposium by German BMBF (SKALB project) is gratefully acknowledged.
Posted in Research | Tags: Call for Papers, CfD, Workshops | Write a comment
November 25th, 2009
This paper in the Proceedings of the Institution of Civil Engineers describes an application of GPGPU for flood risk modelling by a team based at JBA Consulting in the UK. The model described here has since been used to produce flood risk maps for several countries in Europe.
Abstract:
“Two-dimensional (2D) flood inundation modelling is now an important part of flood risk management practice. Research in the fields of computational hydraulics and numerical methods, allied with advances in computer technology and software design, have brought 2D models into mainstream use. Even so, the models are computationally demanding and can take a long time to run, especially for large areas and at high spatial resolutions (for instance 2 × 2 m or smaller grid cells). There is thus strong motivation to accelerate 2D model codes. This paper demonstrates the use of technology from the computer graphics industry to accelerate a 2D diffusion wave (non-inertial) floodplain model. Over the past decade the market for computer games has driven the development of very fast, relatively low-cost ‘graphical processing units’. In recent years there has been a growing interest in this high-performance graphics hardware for scientific and engineering applications. This work adapted a flood model algorithm to run on a commodity personal computer graphics card. The results of a benchmark urban flood simulation were reproduced and the model run time reduced from 18 h to 9·5 min.”
(Lamb, R., Crossley, A. and Waller, S. 2009. A fast two-dimensional floodplain inundation model. Proceedings of the Institution of Civil Engineers – Water Management, Volume 162, Issue 6, pages 363–370. DOI: 10.1680/wama.2009.162.6.363)
Posted in Research | Tags: Fluid Simulation, Papers | Write a comment
November 25th, 2009
Abstract:
Cellular-level agent based modelling is reliant on either sequential processing environments or expensive and largely unavailable PC grids. The GPU offers an alternative architecture for such systems, however the steep learning curve associated with the GPU’s data parallel architecture has previously limited the uptake of this emerging technology. In this paper we demonstrate a template driven agent architecture which provides a mapping of XML model specifications and C language scripting to optimised Compute Unified Device Architecture (CUDA) for the GPU. Our work is validated though the implementation of a Keratinocyte model using limited range message communication with non-linear time simulation steps to resolve intercellular forces. The performance gain achieved over existing modelling techniques reduces simulation times from hours to seconds. The improvement of simulation performance allows us to present a real-time visualisation technique which was previously unobtainable.
(Richmond Paul, Coakley Simon, Romano Daniela (2009), Cellular Level Agent Based Modelling on the Graphics Processing Unit, (Best Student Paper) Proc. of HiBi09 – High Performance Computational Systems Biology, 14-16 October 2009, Trento, Italy)
Posted in Research | Tags: Agent-Based Modeling, Computational Biology, Papers | Write a comment
November 25th, 2009
In this paper, Takizawa et al. have presented a tool named CheCUDA that is designed to checkpoint CUDA applications. As existing checkpoint/restart implementations do not support checkpointing the GPU status, CheCUDA hooks basic CUDA driver API calls in order to record the GPU status changes on the main memory. At checkpointing, CheCUDA stores the status changes in a file after copying all necessary data in the video memory to the main memory and then disabling the CUDA runtime. At restart, CheCUDA reads the file, re-initializes the CUDA runtime, and recovers the resources on GPUs so as to restart from the stored status. This paper demonstrates that a prototype implementation of CheCUDA can correctly checkpoint and restart a CUDA application written with basic APIs. This also indicates that CheCUDA can migrate a process from one PC to another even if the process uses a GPU. Accordingly, CheCUDA is useful not only to enhance the dependability of CUDA applications but also to enable dynamic task scheduling of CUDA applications required especially on heterogeneous GPU cluster systems. This paper also shows the timing overhead for checkpointing.
(Hiroyuki Takizawa, Katuto Sato, Kazuhiko Komatsu, and Hiroaki Kobayashi, CheCUDA: A Checkpoint/Restart Tool for CUDA Applications, to appear inProceedings of the Tenth International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT) 2009, Workshop on Ultra Performance and Dependable Acceleration Systems).
Posted in Research | Tags: NVIDIA CUDA, Papers, Scientific Computing, Tools | Write a comment
Page 5 of 56« First...«34567»...Last »