October 19th, 2011
A paper detailing several possible avenues to expand MPI to accelerators has just been presented at “Architectures and System for Big Data (ASBD) 2011″, a workshop at PACT 2011. The abstract and a link to the paper are both below. We (the authors) are looking for feedback as to which options seem attractive to GPU programmers and developers. We welcome any comments/thoughts/critiques you might have.
Current trends in computing and system architecture point towards a need for accelerators such as GPUs to have inherent communication capabilities. We review previous and current software libraries that provide pseudo-communication abilities through direct message passing. We show how these libraries are beneficial to the HPC community, but are not forward-thinking enough. We give motivation as to why MPI should be extended to support these accelerators, and provide a road map of achievable milestones to complete such an extension, some of which require advances in hardware and device drivers.
(Jeff A. Stuart, Pavan Balaji and John D. Owens, “Extending MPI to Accelerators”, PACT 2011 Workshop Series: Architectures and Systems for Big Data, October 2011. [WWW])
Posted in Research | Tags: Clusters, MPI, NVIDIA GPUdirect, Papers | Write a comment
October 19th, 2011
OCLTools is a powerful, yet compact, suite of Open Source tools that provide OpenCL developers with more alternatives to kernel compilation. OCLTools enables developers to eliminate costly kernel compilation time from the runtime of your application. With OCLTools developers can embed the source code of their kernels (clear text or encrypted) directly into their program binaries, eliminating the need to distribute kernel source code in the open while still maintaining the flexibility of runtime compilation. Both source code and precompiled binaries can be embedded into OpenCL binaries, effectively eliminating the additional kernel compilation overhead from the run time of your application.
For more information go to http://www.clusterchimps.org
Posted in Developer Resources | Tags: Open Source, OpenCL, Tools | Write a comment
October 7th, 2011
The 2012 Spring Simulation Multi-conference will feature the 20th High Performance Computing Symposium (HPC 2012), devoted to the impact of high performance computing and communications on computer simulations. Topics of interest include:
- high performance/large scale application case studies,
- GPUs for general purpose computations (GPGPU)
- multicore and many-core computing,
- power aware computing,
- large scale visualization and data management,
- tools and environments for coupling parallel codes,
- parallel algorithms and architectures,
- high performance software tools,
- component technologies for high performance computing.
Important dates: Paper submission due: December 2, 2011; Notification of acceptance: January 13, 2012; Revised manuscript due: January 27, 2012; Symposium: March 26–29, 2012.
Posted in Events, Research | Tags: Conferences, High-Performance Computing, Scientific Computing | 1 Comment
October 2nd, 2011
Abstract:
In this paper, we propose a fine-grained cycle sharing (FGCS) system capable of exploiting idle graphics processing units (GPUs) for accelerating sequence homology search in local area network environments. Our system exploits short idle periods on GPUs by running small parts of guest programs such that each part can be completed within hundreds of milliseconds. To detect such short idle periods from the pool of registered resources, our system continuously monitors keyboard and mouse activities via event handlers rather than waiting for a screensaver, as is typically deployed in existing systems. Our system also divides guest tasks into small parts according to a performance model that estimates execution times of the parts. This task division strategy minimizes any disruption to the owners of the GPU resources. Experimental results show that our FGCS system running on two non-dedicated GPUs achieves 111-116% of the throughput achieved by a single dedicated GPU. Furthermore, our system provides over two times the throughput of a screensaver-based system. We also show that the idle periods detected by our system constitute half of the system uptime. We believe that the GPUs hidden and often unused in office environments provide a powerful solution to sequence homology search.
(Fumihiko Ino, Yuma Munekawa, and Kenichi Hagihara, “Sequence Homology Search using Fine-Grained Cycle Sharing of Idle GPUs”, accepted for publication in IEEE Transactions on Parallel and Distributed Systems, Sep. 2011. [DOI])
Posted in Research | Tags: Bioinformatics, NVIDIA CUDA, Papers, Sequence Alignment | Write a comment
September 24th, 2011
The second 2-day CUDA programming workshop in Berlin takes place November 5-6. Course details, outline and prices are available at http://cuda.eventbrite.com.
Posted in Business, Events | Tags: Courses, NVIDIA CUDA | Write a comment
September 24th, 2011
The latest release of Symscape’s ofgpu (v0.2) for OpenFOAM® 2.0.x is now available. ofgpu is an open source experimental linear solver library that targets NVIDIA CUDA GPU devices on Windows, Linux, and (untested) Mac OS X. ofgpu now has support for the Cusp preconditioners:
- smoothed_aggregation – equivalent to Algebraic Multi-Grid (AMG)
- scaled_bridson_ainv
- bridson_ainv
- nonsym_bridson_ainv
Also supported is the option to select the GPU device. For more details see http://www.symscape.com/gpu-0-2-openfoam.
Posted in Developer Resources | Tags: Iterative Solvers, NVIDIA CUDA, Open Source, OpenFOAM | Write a comment
September 15th, 2011
AMD just released to open source a project called Aparapi that started in their JavaLabs team. Aparapi is an API for expressing data parallel workloads in Java and a runtime component capable of converting the Java bytecode of compatible workloads into OpenCL™ so that it can be executed on a variety of GPU devices. More information can be found in this blog entry.
Posted in Developer Resources | Tags: AMD, Java, Open Source, OpenCL, Tools | Write a comment
September 12th, 2011
Abstract:
This chapter demonstrates how to leverage the Thrust parallel template library to implement high-performance applications with minimal programming effort. Based on the C++ Standard Template Library (STL), Thrust brings a familiar high-level interface to the realm of GPU Computing while remaining fully interoperable with the rest of the CUDA software ecosystem. Applications written with Thrust are concise, readable, and efficient.
(Nathan Bell and Jared Hoberock: “Thrust: A Productivity-Oriented Library for CUDA”, GPU Computing Gems, Jade Edition, edited by Wen-mei W. Hwu, October 2011)
Posted in Developer Resources, Research | Tags: Libraries, NVIDIA CUDA, Papers, Tools | 1 Comment
September 10th, 2011
From the abstract of a GPU market analysis whitepaper by John Peddie Research:
Computer graphics is hard work. Behind the images you see in games and movies, or while editing photos or video, some serious processing is taking place. All the processing power you can muster is needed to push and polish pixels. And this task is only going to get more demanding as these applications get more sophisticated. Graphics Processing Units (GPUs), which do the heavy lifting in computer graphics, range greatly in size, price and performance. They span from tiny cores inside an ARM processor (such as Nvidia’s Tegra or Qualcomm’s Snapdragon), to graphics integrated within an X86 processor (such as AMD’s Fusion, Intel’s Sandy Bridge), to a standalone discrete device, or dGPU (such as AMD’s Radeon, or Nvidia’s GeForce).
More information: http://jonpeddie.com/media/presentations/an-analysis-of-the-gpu-market/
Posted in Business | Tags: GPUs, Market | 1 Comment
September 8th, 2011
libCL is an open-source parallel algorithm library written in C++ and OpenCL. Rather than a specific domain, libCL intends to encompass a wide range of parallel algorithms and data structures. The goal is to provide a comprehensive repository for high performance visual-centric computing ranging from fundamental primitives such as sorting, searching and algebra to advanced systems of algorithms for computational research and visualization. The current distribution of libCL already contains entirely parallelized implementations of the following algorithms:
- Bounding volume hierarchy construction
- Smoothed particle hydrodynamics
- Radix sort
- Adaptive tone-mapping
- Screen-space ambient occlusion culling
- Bilateral and Recursive Gaussian
libCL emerged out of OpenCL Studio, and as such integrates well with the development environment and its visualization capabilities. libCL is Open Source and released under the Apache license.
Posted in Developer Resources | Tags: Open Source, OpenCL | Write a comment