Occasionally, we receive news submissions pointing us to interesting older papers that somehow slipped by without our notice. This post collects a few of those. If you want your work to be posted on GPGPU.org in a timely manner, please remember to use the news submission form.
Joshua A. Anderson, Chris D. Lorenz and Alex Travesset present and discuss molecular dynamics simulations and compare a single GPU against a 36-CPU cluster (General purpose molecular dynamics simulations fully implemented on graphics processing units, Journal of Computational Physics 227(10), May 2008, DOI 10.1016/j.jcp.2008.01.047).
Wen-mei Hwu et al. derive and discuss goals and concepts of programming models for fine-grained parallel architectures, from the point of view of both a programmer and a hardware /compiler designer, and analyze CUDA as one current representative (Implicitly parallel programming models for thousand-core microprocessors, Proceedings of DAC’07, June 2007, DOI 10.1145/1278480.1278669).
Jeremy Sugerman et al. present GRAMPS, a prototype implementation of future graphics hardware that allows pipelines to be specified as graphs in software (GRAMPS: A Programming Model for Graphics Pipelines, ACM Transactions on Graphics 28(1), January 2009, DOI 10.1145/1477926.1477930).
William R. Mark discusses concepts of future graphics architectures in this contribution to the 2008 ACM Queue special issue on GPUs (Future graphics architectures, ACM Queue 6(2), March/April 2008, DOI 10.1145/1365490.1365501).
BSGP by Qiming Hou et al. is a new programming language for general purpose GPU computing that achieves the same efficiency as well-tuned CUDA programs but makes code much easier to read, develop and maintain (BSGP: bulk-synchronous GPU programming, ACM Siggraph 2008, August 2008, DOI 10.1145/1399504.1360618).
This workshop will be held in conjunction with CIT 2010, Bradford, UK, 29 June – 01 July, 2010. From the announcement:
We are undergoing a new revolution in parallel processor technologies, especially the Graphics Processing Units. GPUs have become widely used nowadays to accelerate a broad range of applications, including computational finance, numerical computing, image/video processing, engineering simulations, quantum chemistry, just to name a few.
The goal of this workshop is to provide a forum for researchers and practitioners to discuss and share their research and development experiences and outputs on the massively parallel GPU platforms, software development tools, optimization techniques, parallel algorithm design, and all kinds of successful applications. We solicit original and previously unpublished papers addressing research challenges and advances towards the design, implementation and evaluation of massively parallel GPU computing.
Following the tremendous success of HotPar ’09, the Second USENIX Workshop on Hot Topics in Parallelism (HotPar ’10) will once again bring together researchers and practitioners doing innovative work in the area of parallel computing. Multicore processors are the pervasive computing platform of the future. This trend is driven by limits on energy consumption in computer systems and the poor energy performance of conventional microprocessors. Parallel architectures can potentially mitigate these problems, but this new computer architecture will only be successful if languages, systems, and applications can take advantage of parallel hardware. Navigating this change will require new concurrency-friendly programming paradigms, new methods of application design, new structures for system software, and new models of interaction between applications, compilers, operating systems, and hardware.
We request submissions of position papers that propose new directions for research of products in these areas, advocate non-traditional approaches to the problems engendered by parallelism, or potentially generate controversy and discussion. We encourage submissions from practitioners as well as from researchers. Read the rest of this entry »
Multi- and many-core microprocessors are being deployed in a broad spectrum of applications including Clusters, Clouds and Grids. Both conventional multi- and many-core processors, such as Intel Nehalem and IBM Power7 processors, and unconventional many-core processors, such as NVIDIA Tesla and AMD FireStream GPUs, hold the promise of increasing performance through parallelism. However, GPU approaches in parallelism are distinctly different from those of conventional multi- and many-core processors, which raises new challenges: For example, how do we optimize applications for conventional multi- and many-core processors? How do we reengineer applications to take advantage of GPUs’ tremendous computing power in a reasonable cost-benefit ratio? What are effective ways of using GPUs as accelerators? The goals of this workshop are to discuss these and other issues and bring together developers of application algorithms and experts in utilizing multi- and many-core processors. Accepted papers will be published in the CCGRID proceedings. Selected papers will be published in a special issue of the Journal Concurrency and Computation: Practice and Experience.
Taking inspiration from genetic screening techniques, researchers from MIT and Harvard have demonstrated a way to build better artificial visual systems with the help of low-cost, high-performance gaming hardware.
The neural processing involved in visually recognizing even the simplest object in a natural environment is profound — and profoundly difficult to mimic. Neuroscientists have made broad advances in understanding the visual system, but much of the inner workings of biologically based systems remain a mystery.
Using Graphics Processing Units (GPUs) — the same technology video game designers use to render life-like graphics — MIT and Harvard researchers are now making progress faster than ever before. “We made a powerful computing system that delivers over hundred fold speed-ups relative to conventional methods,” said Nicolas Pinto, a PhD candidate in James DiCarlo’s lab at the McGovern Institute for Brain Research at MIT. “With this extra computational power, we can discover new vision models that traditional methods miss.” Pinto co-authored the PLoS study with David Cox of the Visual Neuroscience Group at the Rowland Institute at Harvard.
GPUs have recently evolved into very fast parallel coprocessors capable of executing general-purpose computations extremely efficiently. At the same time, multicore CPUs evolution continued and today’s CPUs have 4-8 cores. These two trends, however, have followed independent paths in the sense that we are aware of very few works that consider both devices cooperating to solve general computations. In this paper we investigate the coordinated use of CPU and GPU to improve efficiency of applications even further than using either device independently. We use Anthill runtime environment, a data-flow oriented framework in which applications are decomposed into a set of event-driven filters, where for each event, the runtime system can use either GPU or CPU for its processing. For evaluation, we use a histopathology application that uses image analysis techniques to classify tumor images for neuroblastoma prognosis. Our experimental environment includes dual and octa-core machines, augmented with GPUs and we evaluate our approach’s performance for standalone and distributed executions. Our experiments show that a pure GPU optimization of the application achieved a factor of 15 to 49 times improvement over the single-core CPU version, depending on the versions of the CPUs and GPUs. We also show that the execution can be further reduced by a factor of about 2 by using our runtime system that effectively choreographs the execution to run cooperatively both on GPU and on a single core of CPU. We improve on that by adding more cores, all of which were previously neglected or used ineffectively. In addition, the evaluation on a distributed environment has shown near linear scalability to multiple hosts.
(George Teodoro, Rafael Sachetto, Olcay Sertel, Metin Gurcan, Wagner Meira Jr., Umit Catalyurek, and Renato Ferreira. Coordinating the Use of GPU and CPU for Improving Performance of Compute Intensive Applications. IEEE Cluster 2009. New Orleans, LA, USA. Presentation. Paper.)
This paper presents, to the author’s knowledge, the first graphics processing unit (GPU) accelerated program that solves the evolution of interacting scalar fields in an expanding universe. We present the implementation in NVIDIA’s Compute Unified Device Architecture (CUDA) and compare the performance to other similar programs in chaotic inflation models. We report speedups between one and two orders of magnitude depending on the used hardware and software while achieving small errors in single precision. Simulations that used to last roughly one day to compute can now be done in hours and this difference is expected to increase in the future. The program has been written in the spirit of LATTICEEASY and users of the aforementioned program should find it relatively easy to start using CUDAEASY in lattice simulations. The program is available under the GNU General Public License.
HPMC is a small OpenGL/C/C++-library that extracts iso-surfaces of volumetric data directly on the GPU.
The library analyzes a lattice of scalar values describing a scalar field that is either stored in a Texture3D or can be accessed through an application-provided snippet of shader code. The output is a sequence of vertex positions and normals that form a triangulation of the iso-surface. HPMC provides traversal code to be included in an application vertex shader, which allows direct extraction in the vertex shader. Using the OpenGL transform feedback mechanism, the triangulation can be stored directly into a buffer object.
MTGP is a new variant of the Mersenne Twister (MT) pseudorandom number generator introduced by Mutsuo Saito and Makoto Matsumoto in 2009. MTGP is designed to take advantage of some features of GPUs, such as parallel execution and hi-speed constant reference. It supports 32-bit and 64-bit integers, as well as single and double precision floating point as output.
ICS is the premier international forum for the presentation of research results in high-performance computing systems. In 2010 the conference will be held at the Epochal Tsukuba (Tsukuba International Congress Center) in Tsukuba City, the largest high-tech and academic
city in Japan.
Papers are solicited on all aspects of research, development, and application of high-performance experimental and commercial systems. Special emphasis will be given to work that leads to better understanding of the implications of the new era of million-scale parallelism and Exa-scale performance; including (but not limited to): Read the rest of this entry »