March 31st, 2009
Posted in Developer Resources, Research | Tags: Image Processing, Libraries, Papers, Signal Processing | Write a comment
March 31st, 2009
The Interdisciplinary Centre for Mathematical and Computational Modelling and Institute of Informatics at the University of Warsaw held a mini-conference and workshop called “Applications of Graphic Processors in High Performance Computing” on March 19-21, in Warsaw, Poland. Speakers at the conference were authors of several publications on applications of GPUs on methods for GPU programming, and applications of GPUs in analysis of medical data, computational fluid mechanics, bioinformatics and finite element computations. A hands-on workshop on CUDA programming was offered for a limited number of conference participants. More information is available at the AGPinHPC conference website.
Posted in Events | Tags: High-Performance Computing, Workshops | Write a comment
March 31st, 2009
This workshop, organized in conjunction with INFORMATIK 2009, the 39th annual meeting of the Gesellschaft für Informatik e.V. (GI). This one day event will take place in Lübeck Germany, during the duration of INFORMATIK 2009 (September 28th – October 2nd, 2009). The workshop will include tutorials, refereed sessions, invited talks, and an open discussion session on future developments. Submissions are encouraged in all areas of Massively-Parallel Computational Biology on GPUs (Graphics Processing Units) including but not limited to
- Parallel and massively-parallel Programming and Algorithms
- Algorithmic Aspects of Computational Biology
- Applications and Implementations on GPUs
The submission deadline is April 26, 2009. For more information visit the BioGPU 2009 Website.
Posted in Events | Tags: Computational Biology, Scientific Computing, Workshops | Write a comment
March 23rd, 2009
Submissions of relevant research on GPU Computing are invited to this minisymposium, organized as part of the 2009 International Conference on Parallel Processing and Applied Mathematics (PPAM 2009: Wroclaw, Poland, September 13-16, 2009) by April 10, 2009. Topics of interests include, but are not limited to, the development of techniques and tools (e.g., compilers, high-level application programming interfaces, etc.) that improve the programmability of GPUs as well as practical demonstrations of the potential of GPUs in the solution of scientific, engineering and commercial applications. Best papers presented at the minisymposium will be considered for a special issue of Concurrency and Computation: Practice & Experience. For further details, visit GPUCOMP2009.
Posted in Events | Tags: Conferences | Write a comment
March 17th, 2009
Slides are now available for the minisymposium “Scientific Computing on Emerging Many-Core architectures”, held in conjunction with the SIAM Conference on Computational Science and Engineering 2009 (SIAM CSE’09, Miami, Florida). The minisymposium, organised by Mike Giles, Dominik Göddeke and Stefan Turek, focused on opportunities and challenges for scientific computing on novel many-core architectures, in particular IBM’s Cell processor and GPUs from NVIDIA, AMD and Intel. The talks covered a range of application areas, including the development of libraries and other tools to simplify the programming many-core processors. (Minisymposium: Scientific Computing on Emerging Many-Core architectures)
Posted in Events, Research | Tags: Conferences, Many-core, Scientific Computing | Write a comment
March 11th, 2009
In this ClusterMonkey article, Andrew Humber, Senior PR Manager for Tesla and CUDA Technologies at NVIDIA Corporation, summarizes the events that made 2008 a truly exciting year for GPU Computing. (A Year in Review from the NVIDIA Tesla Team, ClusterMonkey)
Posted in Press | Tags: NVIDIA CUDA, NVIDIA Tesla, OpenCL | Write a comment
March 11th, 2009
Abstract:
In this paper, the authors present a library, named Sapporo, which closely emulates the GRAPE-6 API. The library is written in CUDA and implements most common functions that are used in N-body codes supporting GRAPE-6. As a result such codes will be able to use Sapporo without modification to their source code. The library also supports use of multiple GPUs per host. The authors carried out a series systematic tests to test the performance, accuracy and ability of the library to handle a realistic N-body problem. They found the performance of the library with a single G80/G92 GPU is a factor of two higher than that of GRAPE-6A(BLX) PCI(X)-cards, and the sustained performance with 2x GeForce 9800GX2 cards is on par with a 32-chip GRAPE-6 system (about 800 GFlop/s). The accuracy of the library is comparable to that of GRAPE-6 hardware, and its ability to correctly solve a realistic N-body problem provides an alternative for GRAPE-6 special purpose hardware.
(Evghenii Gaburov, Stefan Harfst and Simon Portegies Zwart, SAPPORO: A way to turn your graphics cards into a GRAPE-6, Submitted to New Astronomy)
Posted in Research | Tags: N-Body, NVIDIA CUDA, Papers, Physics Simulation | Write a comment
March 11th, 2009
Abstract:
This paper explores the challenges in implementing a message passing interface usable on systems with data-parallel processors. As a case study, we design and implement the “DCGN” API on NVIDIA GPUs that is similar to MPI and allows full access to the underlying architecture. We introduce the notion of data-parallel thread-groups as a way to map resources to MPI ranks. We use a method that also allows the data-parallel processors to run autonomously from user-written CPU code. In order to facilitate communication, we use a sleep-based polling system to store and retrieve messages. Unlike previous systems, our method provides both performance and flexibility. By running a test suite of applications with different communication requirements, we find that a tolerable amount of overhead is incurred, somewhere between one and five percent depending on the application, and indicate the locations where this overhead accumulates. We conclude that with innovations in chipsets and drivers, this overhead will be mitigated and provide similar performance to typical CPU based MPI implementations while providing fully-dynamic communication.
(Jeff A. Stuart and John D. Owens, Message Passing on Data-Parallel Architectures, Proceedings of the 23rd IEEE International Parallel and Distributed Processing Symposium)
Posted in Research | Tags: APIs, NVIDIA CUDA, Papers, Programming Languages, Supercomputing, Tools | Write a comment
March 11th, 2009
This article by Jeff Layton at ClusterMonkey summarizes the history of GPU Computing in terms of high-level programming languages and abstractions, from the early days of GPGPU programming using graphics APIs, to Stream, CUDA and OpenCL. The second half of the article provides an introduction to the PGI 8.0 Technology Preview, which allows the use of pragmas to automatically parallelize and run compute-intensive kernels in standard C and Fortran code on accelerators like GPUs. (GPU Programming For the Rest Of Us, Jeff Layton, ClusterMonkey.net)
Posted in Developer Resources, Press | Tags: APIs, Programming Languages, Tools | Write a comment
March 1st, 2009
This IPDPS 2009 paper by Nadathur Satish, Mark Harris, and Michael Garland describes the design of high-performance parallel radix sort and merge sort routines for manycore GPUs, taking advantage of the full programmability offered by NVIDIA CUDA. The radix sort described is the fastest GPU sort and the merge sort described is the fastest comparison-based GPU sort reported in the literature. The radix sort is up to 4 times faster than the graphics-based GPUSort and greater than 2 times faster than other CUDA-based radix sorts. It is also 23% faster, on average, than even a very carefully optimized multicore CPU sorting routine. To achieve this performance, the authors carefully design the algorithms to expose substantial fine-grained parallelism and decompose the computation into independent tasks that perform minimal global communication. They exploit the high-speed on-chip shared memory provided by NVIDIA’s GPU architecture and efficient data-parallel primitives, particularly parallel scan. While targeted at GPUs, these algorithms should also be well-suited for other manycore processors. (N. Satish, M. Harris, and M. Garland. Designing efficient sorting algorithms for manycore GPUs. Proc. 23rd IEEE Int’l Parallel & Distributed Processing Symposium, May 2009. To appear.)
Posted in Research | Tags: Data-Parallel, NVIDIA CUDA, Parallel Algorithms, Sorting | Write a comment