Direct N-body Kernels for Multicore Platforms

January 24th, 2010

From the abstract:

We present an inter-architectural comparison of single- and double-precision direct n-body implementations on modern multicore platforms, including those based on the Intel Nehalem and AMD Barcelona systems, the Sony-Toshiba-IBM PowerXCell/8i processor, and NVIDA Tesla C870 and C1060 GPU systems. We compare our implementations across platforms on a variety of proxy measures, including performance, coding complexity, and energy efficiency.

Nitin Arora, Aashay Shringarpure, and Richard Vuduc. “Direct n-body kernels for multicore platforms.” In Proc. Int’l. Conf. Parallel Processing (ICPP), Vienna, Austria, September 2009 (direct link to PDF).

GPU Simulations of Gravitational Many-body Problem and GPU Octrees

January 20th, 2010

This undergraduate thesis and poster by Kajuki Fujiwara and  Naohito Nakasato from the University of Aizu approach a common problem in astrophysics: the many-body problem, with both brute-force and hierarchical data structures for solving it on ATI GPUs.  Abstracts:

Fast Simulations of Gravitational Many-body Problem on RV770 GPU
Kazuki FujiwaraNaohito Nakasato (University of Aizu)
Abstract:

The gravitational many-body problem is a problem concerning the movement of bodies, which are interacting through gravity. However, solving the gravitational many-body problem with a CPU takes a lot of time due to O(N^2) computational complexity. In this paper, we show how to speed-up the gravitational many-body problem by using GPU. After extensive optimizations, the peak performance obtained so far is about 1 Tflops.

Oct-tree Method on GPU
N.Nakasato
Abstract:

The kd-tree is a fundamental tool in computer science. Among others, an application of the kd-tree search (oct-tree method) to fast evaluation of particle interactions and neighbor search is highly important since computational complexity of these problems are reduced from O(N^2) with a brute force method to O(N log N) with the tree method where N is a number of particles. In this paper, we present a parallel implementation of the tree method running on a graphic processor unit (GPU). We successfully run a simulation of structure formation in the universe very efficiently. On our system, which costs roughly $900, the run with N ~ 2.87×10^6 particles took 5.79 hours and executed 1.2×10^13 force evaluations in total. We obtained the sustained computing speed of 21.8 Gflops and the cost per Gflops of 41.6/Gflops that is two and half times better than the previous record in 2006.

Some older publications worth reading

January 17th, 2010

Occasionally, we receive news submissions pointing us to interesting older papers that somehow slipped by without our notice. This post collects a few of those. If you want your work to be posted on GPGPU.org  in a timely manner, please remember to use the news submission form.

  • Joshua A. Anderson, Chris D. Lorenz and Alex Travesset present and discuss molecular dynamics simulations and compare a single GPU against a 36-CPU cluster (General purpose molecular dynamics simulations fully implemented on graphics processing units, Journal of Computational Physics 227(10), May 2008, DOI 10.1016/j.jcp.2008.01.047).
  • Wen-mei Hwu et al. derive and discuss goals and concepts of programming models for fine-grained parallel architectures, from the point of view of both a programmer and a hardware /compiler designer, and analyze CUDA as one current representative  (Implicitly parallel programming models for thousand-core microprocessors, Proceedings of DAC’07, June 2007, DOI 10.1145/1278480.1278669).
  • Jeremy Sugerman et al. present GRAMPS, a prototype implementation of future graphics hardware that allows pipelines to be specified as graphs in software (GRAMPS: A Programming Model for Graphics Pipelines, ACM Transactions on Graphics 28(1), January 2009, DOI 10.1145/1477926.1477930).
  • William R. Mark discusses concepts of future graphics architectures in this contribution to the 2008 ACM Queue special issue on GPUs (Future graphics architectures, ACM Queue 6(2), March/April 2008,  DOI 10.1145/1365490.1365501).
  • BSGP by Qiming Hou et al. is a new programming language for general purpose GPU computing that achieves the same efficiency as well-tuned CUDA programs but makes code much easier to read, develop and maintain (BSGP: bulk-synchronous GPU programming, ACM Siggraph 2008, August 2008, DOI 10.1145/1399504.1360618).
  • Finally, Che et al. and Garland et al. survey the field of GPU computing and discuss many different application domains. These articles are, in addition to the ones we have collected on the developer pages, recommended to GPGPU newcomers.

CFP: FGC 2010 – The First International Workshop on Frontier of GPU Computing

January 15th, 2010

This workshop will be held in conjunction with CIT 2010, Bradford, UK, 29 June – 01 July, 2010.  From the announcement:

We are undergoing a new revolution in parallel processor technologies, especially the Graphics Processing Units. GPUs have become widely used nowadays to accelerate a broad range of applications, including computational finance, numerical computing, image/video processing, engineering simulations, quantum chemistry, just to name a few.
The goal of this workshop is to provide a forum for researchers and practitioners to discuss and share their research and development experiences and outputs on the massively parallel GPU platforms, software development tools, optimization techniques, parallel algorithm design, and all kinds of successful applications. We solicit original and previously unpublished papers addressing research challenges and advances towards the design, implementation and evaluation of massively parallel GPU computing.

Read the rest of this entry »

CFP: Second USENIX Workshop on Hot Topics in Parallelism

December 20th, 2009

Second USENIX Workshop on Hot Topics in Parallelism (HotPar ’10)
June 14-15, Berkeley, CA

Website: http://www.usenix.org/event/hotpar10/

Following the tremendous success of HotPar ’09, the Second USENIX Workshop on Hot Topics in Parallelism (HotPar ’10) will once again bring together researchers and practitioners doing innovative work in the area of parallel computing. Multicore processors are the pervasive computing platform of the future. This trend is driven by limits on energy consumption in computer systems and the poor energy performance of conventional microprocessors. Parallel architectures can potentially mitigate these problems, but this new computer architecture will only be successful if languages, systems, and applications can take advantage of parallel hardware. Navigating this change will require new concurrency-friendly programming paradigms, new methods of application design, new structures for system software, and new models of interaction between applications, compilers, operating systems, and hardware.

Submissions

We request submissions of position papers that propose new directions for research of products in these areas, advocate non-traditional approaches to the problems engendered by parallelism, or potentially generate controversy and discussion. We encourage submissions from practitioners as well as from researchers. Read the rest of this entry »

CFP: Frontiers of GPU, Multi- and Many-Core Systems Workshop at CCGrid 2010

December 11th, 2009

Multi- and many-core microprocessors are being deployed in a broad spectrum of applications including Clusters, Clouds and Grids. Both conventional multi- and many-core processors, such as Intel Nehalem and IBM Power7 processors, and unconventional many-core processors, such as NVIDIA Tesla and AMD FireStream GPUs, hold the promise of increasing performance through parallelism. However, GPU approaches in parallelism are distinctly different from those of conventional multi- and many-core processors, which raises new challenges: For example, how do we optimize applications for conventional multi- and many-core processors? How do we reengineer applications to take advantage of GPUs’ tremendous computing power in a reasonable cost-benefit ratio? What are effective ways of using GPUs as accelerators? The goals of this workshop are to discuss these and other issues and bring together developers of application algorithms and experts in utilizing multi- and many-core processors. Accepted papers will be published in the CCGRID proceedings. Selected papers will be published in a special issue of the Journal Concurrency and Computation: Practice and Experience.

Topics of interests include (but not limited to): Read the rest of this entry »

Using NVIDIA GPUs and PyCUDA, MIT and Harvard researchers demonstrate a better way for computers to ‘see’

December 8th, 2009

From: http://web.mit.edu/press/2009/visual-systems.html

Taking inspiration from genetic screening techniques, researchers from MIT and Harvard have demonstrated a way to build better artificial visual systems with the help of low-cost, high-performance gaming hardware.

The neural processing involved in visually recognizing even the simplest object in a natural environment is profound — and profoundly difficult to mimic. Neuroscientists have made broad advances in understanding the visual system, but much of the inner workings of biologically based systems remain a mystery.

Using Graphics Processing Units (GPUs) — the same technology video game designers use to render life-like graphics — MIT and Harvard researchers are now making progress faster than ever before. “We made a powerful computing system that delivers over hundred fold speed-ups relative to conventional methods,” said Nicolas Pinto, a PhD candidate in James DiCarlo’s lab at the McGovern Institute for Brain Research at MIT. “With this extra computational power, we can discover new vision models that traditional methods miss.” Pinto co-authored the PLoS study with David Cox of the Visual Neuroscience Group at the Rowland Institute at Harvard.

Finding a better way for computers to “see” from Cox Lab @ Rowland Institute on Vimeo.
Read the rest of this entry »

Coordinating the Use of GPU and CPU for Improving Performance of Compute Intensive Applications

December 8th, 2009

Abstract:

GPUs have recently evolved into very fast parallel coprocessors capable of executing general-purpose computations extremely efficiently. At the same time, multicore CPUs evolution continued and today’s CPUs have 4-8 cores. These two trends, however, have followed independent paths in the sense that we are aware of very few works that consider both devices cooperating to solve general computations. In this paper we investigate the coordinated use of CPU and GPU to improve efficiency of applications even further than using either device independently. We use Anthill runtime environment, a data-flow oriented framework in which applications are decomposed into a set of event-driven filters, where for each event, the runtime system can use either GPU or CPU for its processing. For evaluation, we use a histopathology application that uses image analysis techniques to classify tumor images for neuroblastoma prognosis. Our experimental environment includes dual and octa-core machines, augmented with GPUs and we evaluate our approach’s performance for standalone and distributed executions. Our experiments show that a pure GPU optimization of the application achieved a factor of 15 to 49 times improvement over the single-core CPU version, depending on the versions of the CPUs and GPUs. We also show that the execution can be further reduced by a factor of about 2 by using our runtime system that effectively choreographs the execution to run cooperatively both on GPU and on a single core of CPU. We improve on that by adding more cores, all of which were previously neglected or used ineffectively. In addition, the evaluation on a distributed environment has shown near linear scalability to multiple hosts.

(George Teodoro, Rafael Sachetto, Olcay Sertel, Metin Gurcan, Wagner Meira Jr., Umit Catalyurek, and Renato Ferreira. Coordinating the Use of GPU and CPU for Improving Performance of Compute Intensive Applications. IEEE Cluster 2009. New Orleans, LA, USA. PresentationPaper.)

CUDAEASY – a GPU Accelerated Cosmological Lattice Program

December 8th, 2009

Abstract:

This paper presents, to the author’s knowledge, the first graphics processing unit (GPU) accelerated program that solves the evolution of interacting scalar fields in an expanding universe. We present the implementation in NVIDIA’s Compute Unified Device Architecture (CUDA) and compare the performance to other similar programs in chaotic inflation models. We report speedups between one and two orders of magnitude depending on the used hardware and software while achieving small errors in single precision. Simulations that used to last roughly one day to compute can now be done in hours and this difference is expected to increase in the future. The program has been written in the spirit of LATTICEEASY and users of the aforementioned program should find it relatively easy to start using CUDAEASY in lattice simulations. The program is available under the GNU General Public License.

The program is freely available at http://www.physics.utu.fi/theory/particlecosmology/cudaeasy/

(Jani Sainio. “CUDAEASY – a GPU Accelerated Cosmological Lattice Program”. submitted to Computer Physics Communications (under review). November 2009.)

HPMC open-source GPU volumetric iso-surface extraction library

November 30th, 2009

HPMC is a small OpenGL/C/C++-library that extracts iso-surfaces of volumetric data directly on the GPU.

The library analyzes a lattice of scalar values describing a scalar field that is either stored in a Texture3D or can be accessed through an application-provided snippet of shader code. The output is a sequence of vertex positions and normals that form a triangulation of the iso-surface. HPMC provides traversal code to be included in an application vertex shader, which allows direct extraction in the vertex shader. Using the OpenGL transform feedback mechanism, the triangulation can be stored directly into a buffer object.

(C. Dyken, G. Ziegler, C. Theobalt, H.-P. Seidel, High-speed Marching Cubes using Histogram Pyramids, Computer Graphics Forum 27 (8), 2008.)

Page 28 of 57« First...1020...2627282930...4050...Last »