Multi- and many-core microprocessors are being deployed in a broad spectrum of applications including Clusters, Clouds and Grids. Both conventional multi- and many-core processors, such as Intel Nehalem and IBM Power7 processors, and unconventional many-core processors, such as NVIDIA Tesla and AMD FireStream GPUs, hold the promise of increasing performance through parallelism. However, GPU approaches in parallelism are distinctly different from those of conventional multi- and many-core processors, which raises new challenges: For example, how do we optimize applications for conventional multi- and many-core processors? How do we reengineer applications to take advantage of GPUs’ tremendous computing power in a reasonable cost-benefit ratio? What are effective ways of using GPUs as accelerators? The goals of this workshop are to discuss these and other issues and bring together developers of application algorithms and experts in utilizing multi- and many-core processors. Accepted papers will be published in the CCGRID proceedings. Selected papers will be published in a special issue of the Journal Concurrency and Computation: Practice and Experience.
Taking inspiration from genetic screening techniques, researchers from MIT and Harvard have demonstrated a way to build better artificial visual systems with the help of low-cost, high-performance gaming hardware.
The neural processing involved in visually recognizing even the simplest object in a natural environment is profound — and profoundly difficult to mimic. Neuroscientists have made broad advances in understanding the visual system, but much of the inner workings of biologically based systems remain a mystery.
Using Graphics Processing Units (GPUs) — the same technology video game designers use to render life-like graphics — MIT and Harvard researchers are now making progress faster than ever before. “We made a powerful computing system that delivers over hundred fold speed-ups relative to conventional methods,” said Nicolas Pinto, a PhD candidate in James DiCarlo’s lab at the McGovern Institute for Brain Research at MIT. “With this extra computational power, we can discover new vision models that traditional methods miss.” Pinto co-authored the PLoS study with David Cox of the Visual Neuroscience Group at the Rowland Institute at Harvard.
GPUs have recently evolved into very fast parallel coprocessors capable of executing general-purpose computations extremely efficiently. At the same time, multicore CPUs evolution continued and today’s CPUs have 4-8 cores. These two trends, however, have followed independent paths in the sense that we are aware of very few works that consider both devices cooperating to solve general computations. In this paper we investigate the coordinated use of CPU and GPU to improve efficiency of applications even further than using either device independently. We use Anthill runtime environment, a data-flow oriented framework in which applications are decomposed into a set of event-driven filters, where for each event, the runtime system can use either GPU or CPU for its processing. For evaluation, we use a histopathology application that uses image analysis techniques to classify tumor images for neuroblastoma prognosis. Our experimental environment includes dual and octa-core machines, augmented with GPUs and we evaluate our approach’s performance for standalone and distributed executions. Our experiments show that a pure GPU optimization of the application achieved a factor of 15 to 49 times improvement over the single-core CPU version, depending on the versions of the CPUs and GPUs. We also show that the execution can be further reduced by a factor of about 2 by using our runtime system that effectively choreographs the execution to run cooperatively both on GPU and on a single core of CPU. We improve on that by adding more cores, all of which were previously neglected or used ineffectively. In addition, the evaluation on a distributed environment has shown near linear scalability to multiple hosts.
(George Teodoro, Rafael Sachetto, Olcay Sertel, Metin Gurcan, Wagner Meira Jr., Umit Catalyurek, and Renato Ferreira. Coordinating the Use of GPU and CPU for Improving Performance of Compute Intensive Applications. IEEE Cluster 2009. New Orleans, LA, USA. Presentation. Paper.)
This paper presents, to the author’s knowledge, the first graphics processing unit (GPU) accelerated program that solves the evolution of interacting scalar fields in an expanding universe. We present the implementation in NVIDIA’s Compute Unified Device Architecture (CUDA) and compare the performance to other similar programs in chaotic inflation models. We report speedups between one and two orders of magnitude depending on the used hardware and software while achieving small errors in single precision. Simulations that used to last roughly one day to compute can now be done in hours and this difference is expected to increase in the future. The program has been written in the spirit of LATTICEEASY and users of the aforementioned program should find it relatively easy to start using CUDAEASY in lattice simulations. The program is available under the GNU General Public License.
HPMC is a small OpenGL/C/C++-library that extracts iso-surfaces of volumetric data directly on the GPU.
The library analyzes a lattice of scalar values describing a scalar field that is either stored in a Texture3D or can be accessed through an application-provided snippet of shader code. The output is a sequence of vertex positions and normals that form a triangulation of the iso-surface. HPMC provides traversal code to be included in an application vertex shader, which allows direct extraction in the vertex shader. Using the OpenGL transform feedback mechanism, the triangulation can be stored directly into a buffer object.
MTGP is a new variant of the Mersenne Twister (MT) pseudorandom number generator introduced by Mutsuo Saito and Makoto Matsumoto in 2009. MTGP is designed to take advantage of some features of GPUs, such as parallel execution and hi-speed constant reference. It supports 32-bit and 64-bit integers, as well as single and double precision floating point as output.
ICS is the premier international forum for the presentation of research results in high-performance computing systems. In 2010 the conference will be held at the Epochal Tsukuba (Tsukuba International Congress Center) in Tsukuba City, the largest high-tech and academic
city in Japan.
Papers are solicited on all aspects of research, development, and application of high-performance experimental and commercial systems. Special emphasis will be given to work that leads to better understanding of the implications of the new era of million-scale parallelism and Exa-scale performance; including (but not limited to): Read the rest of this entry »
ECCOMAS CFD 2010, one of the world’s most important conferences in the field of CFD, is proud to announce a mini-symposium on “GPU Computing in Computational Fluid Dynamics”, organised by Stefan Turek and Dominik Göddeke.
Contributions to this event are cordially invited and should include a tentative title and an extended abstract. Submissions are due no later than December 15 (via email to stefan.turek (at) math.tu-dortmund.de). For details, please contact Stefan Turek or Dominik Göddeke.
Support of this mini-symposium by German BMBF (SKALB project) is gratefully acknowledged.
This paper in the Proceedings of the Institution of Civil Engineers describes an application of GPGPU for flood risk modelling by a team based at JBA Consulting in the UK. The model described here has since been used to produce flood risk maps for several countries in Europe.
“Two-dimensional (2D) flood inundation modelling is now an important part of flood risk management practice. Research in the fields of computational hydraulics and numerical methods, allied with advances in computer technology and software design, have brought 2D models into mainstream use. Even so, the models are computationally demanding and can take a long time to run, especially for large areas and at high spatial resolutions (for instance 2 × 2 m or smaller grid cells). There is thus strong motivation to accelerate 2D model codes. This paper demonstrates the use of technology from the computer graphics industry to accelerate a 2D diffusion wave (non-inertial) floodplain model. Over the past decade the market for computer games has driven the development of very fast, relatively low-cost ‘graphical processing units’. In recent years there has been a growing interest in this high-performance graphics hardware for scientific and engineering applications. This work adapted a flood model algorithm to run on a commodity personal computer graphics card. The results of a benchmark urban flood simulation were reproduced and the model run time reduced from 18 h to 9·5 min.”
(Lamb, R., Crossley, A. and Waller, S. 2009. A fast two-dimensional floodplain inundation model. Proceedings of the Institution of Civil Engineers – Water Management, Volume 162, Issue 6, pages 363–370. DOI: 10.1680/wama.2009.162.6.363)
Cellular-level agent based modelling is reliant on either sequential processing environments or expensive and largely unavailable PC grids. The GPU offers an alternative architecture for such systems, however the steep learning curve associated with the GPU’s data parallel architecture has previously limited the uptake of this emerging technology. In this paper we demonstrate a template driven agent architecture which provides a mapping of XML model specifications and C language scripting to optimised Compute Unified Device Architecture (CUDA) for the GPU. Our work is validated though the implementation of a Keratinocyte model using limited range message communication with non-linear time simulation steps to resolve intercellular forces. The performance gain achieved over existing modelling techniques reduces simulation times from hours to seconds. The improvement of simulation performance allows us to present a real-time visualisation technique which was previously unobtainable.