December 14th, 2011
October 20th, 2011
In this paper we investigate the use of distributed graphics processing unit (GPU)-based architectures to accelerate pipelined wavefront applications—a ubiquitous class of parallel algorithms used for the solution of a number of scientific and engineering applications. Specifically, we employ a recently developed port of the LU solver (from the NAS Parallel Benchmark suite) to investigate the performance of these algorithms on high-performance computing solutions from NVIDIA (Tesla C1060 and C2050) as well as on traditional clusters (AMD/InfiniBand and IBM BlueGene/P).
Benchmark results are presented for problem classes A to C and a recently developed performance model is used to provide projections for problem classes D and E, the latter of which represents a billion-cell problem. Our results demonstrate that while the theoretical performance of GPU solutions will far exceed those of many traditional technologies, the sustained application performance is currently comparable for scientific wavefront applications. Finally, a breakdown of the GPU solution is conducted, exposing PCIe overheads and decomposition constraints. A new k-blocking strategy is proposed to improve the future performance of this class of algorithm on GPU-based architectures.
(Pennycook, S.J., Hammond, S.D., Mudalige, G.R., Wright, S.A. and Jarvis, S.A.: “On the Acceleration of Wavefront Applications using Distributed Many-Core Architectures”, The Computer Journal (in press) [DOI] [PREPRINT])
October 7th, 2011
The new version 3.1 of rCUDA (Remote CUDA), the Open Source package that allows performing CUDA calls to remote GPUs, is now available. Release highlights:
- Fully updated API to CUDA 4.0 (added support for modules “Peer Device Memory Access” and “Unified Addressing”).
- Fixed low level Surface Reference management functions.
For further information, please visit the rCUDA webpage at http://www.gap.upv.es/rCUDA.
July 17th, 2011
The 2012 Spring Simulation Multi-conference will feature the 20th High Performance Computing Symposium (HPC 2012), devoted to the impact of high performance computing and communications on computer simulations. Topics of interest include:
- high performance/large scale application case studies,
- GPUs for general purpose computations (GPGPU)
- multicore and many-core computing,
- power aware computing,
- large scale visualization and data management,
- tools and environments for coupling parallel codes,
- parallel algorithms and architectures,
- high performance software tools,
- component technologies for high performance computing.
Important dates: Paper submission due: December 2, 2011; Notification of acceptance: January 13, 2012; Revised manuscript due: January 27, 2012; Symposium: March 26–29, 2012.
June 26th, 2011
A new alpha release of rCUDA 3.0 (Remote CUDA), the Open Source package that allows performing CUDA calls to remote GPUs, has been released. Major improvements included in this new version are:
- Partially updated API to 4.0
- Added compatibility support with CUDA 4.0 environment
- Updated CUBLAS API to 4.0 for the most common CUBLAS routines
- Fixed some bugs
- General performance improvements
For further information, please visit the rCUDA webpage.
April 6th, 2011
We propose a new transparent checkpoint/restart (CPR) tool, named CheCL, for high performance and dependable GPU computing. CheCL can perform CPR on an OpenCL application program without any modification and recompilation of its code. A conventional checkpointing system fails to checkpoint a process if the process uses OpenCL. Therefore, in CheCL, every API call is forwarded to another process called an API proxy, and the API proxy invokes the API function; two processes, an application process and an API proxy, are launched for an OpenCL application. In this case, as the application process is not an OpenCL process but a standard process, it can be safely checkpointed. While CheCL intercepts all API calls, it records the information necessary for restoring OpenCL objects. The application process does not hold any OpenCL handles, but CheCL handles to keep such information. Those handles are automatically converted to OpenCL handles and then passed to API functions. Upon restart, OpenCL objects are automatically restored based on the recorded information. This paper demonstrates the feasibility of transparent checkpointing of OpenCL programs including MPI applications, and quantitatively evaluates the runtime overheads. It is also discussed that CheCL can enable process migration of OpenCL applications among distinct nodes, and among different kinds of compute devices such as a CPU and a GPU.
(Hiroyuki Takizawa, Kentaro Koyama, Katuto Sato, Kazuhiko Komatsu, and Hiroaki Kobayashi: “CheCL: Transparent Checkpointing and Process Migration of OpenCL Applications”, Proceedings of International Parallel and Distributed Processing Symposium (IPDPS11), 2011. [PDF])
March 29th, 2011
HOOMD-blue performs general-purpose particle dynamics simulations on a single workstation, taking advantage of NVIDIA GPUs to attain a level of performance equivalent to many cores on a fast cluster. Flexible and configurable, HOOMD-blue is currently being used for coarse-grained molecular dynamics simulations of nano-materials, glasses, and surfactants, dissipative particle dynamics simulations (DPD) of polymers, and crystallization of metals.
HOOMD-blue 0.9.2 adds many new features. Highlights include:
- Long-ranged electrostatics via PPPM
- Support for CUDA 3.2 and 4.0
- New neighbor list option to exclude by particle diameter (for pair.slj)
- New syntax to specify multiple pair coefficients at once
- Improved documentation
- Significant performance boosts for small simulations
- RPM and .deb packaging for CentOS, Fedora, and Ubuntu
- and more
HOOMD-blue 0.9.2 is available for download under an open source license. Check out the quick start tutorial to get started, or check out the full documentation to see everything it can do.
March 1st, 2011
Heterogeneous computing is moving into the mainstream, and a broader range of applications are already on the way. As the provider of world-class CPUs, GPUs, and APUs, AMD offers unique insight into these technologies and how they interoperate. We’ve been working with industry and academia partners to help advance real-world use of these technologies, and to understand the opportunities that lie ahead. It’s time to share what we’ve learned so far.
With tutorials, hands-on labs, and sessions that span a range of topics from HPC to multimedia, you’ll have the opportunity to expand your view of what heterogeneous computing currently offers and where it is going. You’ll hear from industry innovators and academic pioneers who are exploring different ways of approaching problems, and utilizing new paradigms in computing to help identify solutions. You’ll meet AMD experts with deep knowledge of hardware architectures and the software techniques that best leverage those platforms. And you’ll connect with other software professionals who share your passion for the future of technology.
Learn more at developer.amd.com/afds.
February 28th, 2011
Today NVIDIA announced the upcoming 4.0 release of CUDA. While most of the major CUDA releases accompanied a new GPU architecture, 4.0 is a software-only release, but that doesn’t mean there aren’t a lot of new features. With this release, NVIDIA is aiming to lower the barrier to entry to parallel programming on GPUs, with new features including easier multi-GPU programming, a unified virtual memory address space, the powerful Thrust C++ template library, and automatic performance analysis in the Visual Profiler tool. Full details follow in the quoted press release below.
Read the rest of this entry »
February 10th, 2011
Following in the footsteps of the highly successful GPU Users meetup groups in Brisbane, Sydney, Perth and Melbourne, Australia, new GPU meetup groups are popping up around the USA and other countries. Professional “meetup” groups have now formed in New York City, Silicon Valley, Boston, Chicago, Albuquerque and Tokyo, bringing practitioners together to discuss the applications, methods, and technical challenges of using GPUs for algorithm acceleration. The events are free to attend. More information can be found at http://gpu.meetup.com/.
Check out our User Groups page for more.
Proteins, nucleic acids, and small molecules form a dense network of molecular interactions in a cell. The architecture of molecular networks can reveal important principles of cellular organization and function, similarly to the way that protein structure tells us about the function and organization of a protein. Protein complexes are groups of proteins that interact with each other at the same time and place, forming a single multimolecular machine. Functional modules, in contrast, consist of proteins that participate in a particular cellular process while binding each other at a different time and place.
A protein-protein interaction network is represented as proteins are nodes and interactions between proteins are edges. Protein complexes and functional modules can be identified as highly interconnected subgraphs and computational methods are now inevitable to detect them from protein interaction data. In addition, High-throughput screening techniques such as yeast two-hybrid screening enable identification of detailed protein-protein interactions map in multiple species. As the interaction dataset increases, the scale of interconnected protein networks increases exponentially so that the increasing complexity of network gives computational challenges to analyze the networks. Read the rest of this entry »