July 17th, 2011
June 26th, 2011
A new alpha release of rCUDA 3.0 (Remote CUDA), the Open Source package that allows performing CUDA calls to remote GPUs, has been released. Major improvements included in this new version are:
- Partially updated API to 4.0
- Added compatibility support with CUDA 4.0 environment
- Updated CUBLAS API to 4.0 for the most common CUBLAS routines
- Fixed some bugs
- General performance improvements
For further information, please visit the rCUDA webpage.
April 6th, 2011
We propose a new transparent checkpoint/restart (CPR) tool, named CheCL, for high performance and dependable GPU computing. CheCL can perform CPR on an OpenCL application program without any modification and recompilation of its code. A conventional checkpointing system fails to checkpoint a process if the process uses OpenCL. Therefore, in CheCL, every API call is forwarded to another process called an API proxy, and the API proxy invokes the API function; two processes, an application process and an API proxy, are launched for an OpenCL application. In this case, as the application process is not an OpenCL process but a standard process, it can be safely checkpointed. While CheCL intercepts all API calls, it records the information necessary for restoring OpenCL objects. The application process does not hold any OpenCL handles, but CheCL handles to keep such information. Those handles are automatically converted to OpenCL handles and then passed to API functions. Upon restart, OpenCL objects are automatically restored based on the recorded information. This paper demonstrates the feasibility of transparent checkpointing of OpenCL programs including MPI applications, and quantitatively evaluates the runtime overheads. It is also discussed that CheCL can enable process migration of OpenCL applications among distinct nodes, and among different kinds of compute devices such as a CPU and a GPU.
(Hiroyuki Takizawa, Kentaro Koyama, Katuto Sato, Kazuhiko Komatsu, and Hiroaki Kobayashi: “CheCL: Transparent Checkpointing and Process Migration of OpenCL Applications”, Proceedings of International Parallel and Distributed Processing Symposium (IPDPS11), 2011. [PDF])
March 29th, 2011
HOOMD-blue performs general-purpose particle dynamics simulations on a single workstation, taking advantage of NVIDIA GPUs to attain a level of performance equivalent to many cores on a fast cluster. Flexible and configurable, HOOMD-blue is currently being used for coarse-grained molecular dynamics simulations of nano-materials, glasses, and surfactants, dissipative particle dynamics simulations (DPD) of polymers, and crystallization of metals.
HOOMD-blue 0.9.2 adds many new features. Highlights include:
- Long-ranged electrostatics via PPPM
- Support for CUDA 3.2 and 4.0
- New neighbor list option to exclude by particle diameter (for pair.slj)
- New syntax to specify multiple pair coefficients at once
- Improved documentation
- Significant performance boosts for small simulations
- RPM and .deb packaging for CentOS, Fedora, and Ubuntu
- and more
HOOMD-blue 0.9.2 is available for download under an open source license. Check out the quick start tutorial to get started, or check out the full documentation to see everything it can do.
March 1st, 2011
Heterogeneous computing is moving into the mainstream, and a broader range of applications are already on the way. As the provider of world-class CPUs, GPUs, and APUs, AMD offers unique insight into these technologies and how they interoperate. We’ve been working with industry and academia partners to help advance real-world use of these technologies, and to understand the opportunities that lie ahead. It’s time to share what we’ve learned so far.
With tutorials, hands-on labs, and sessions that span a range of topics from HPC to multimedia, you’ll have the opportunity to expand your view of what heterogeneous computing currently offers and where it is going. You’ll hear from industry innovators and academic pioneers who are exploring different ways of approaching problems, and utilizing new paradigms in computing to help identify solutions. You’ll meet AMD experts with deep knowledge of hardware architectures and the software techniques that best leverage those platforms. And you’ll connect with other software professionals who share your passion for the future of technology.
Learn more at developer.amd.com/afds.
February 28th, 2011
Today NVIDIA announced the upcoming 4.0 release of CUDA. While most of the major CUDA releases accompanied a new GPU architecture, 4.0 is a software-only release, but that doesn’t mean there aren’t a lot of new features. With this release, NVIDIA is aiming to lower the barrier to entry to parallel programming on GPUs, with new features including easier multi-GPU programming, a unified virtual memory address space, the powerful Thrust C++ template library, and automatic performance analysis in the Visual Profiler tool. Full details follow in the quoted press release below.
Read the rest of this entry »
February 10th, 2011
Following in the footsteps of the highly successful GPU Users meetup groups in Brisbane, Sydney, Perth and Melbourne, Australia, new GPU meetup groups are popping up around the USA and other countries. Professional “meetup” groups have now formed in New York City, Silicon Valley, Boston, Chicago, Albuquerque and Tokyo, bringing practitioners together to discuss the applications, methods, and technical challenges of using GPUs for algorithm acceleration. The events are free to attend. More information can be found at http://gpu.meetup.com/.
Check out our User Groups page for more.
November 22nd, 2010
Proteins, nucleic acids, and small molecules form a dense network of molecular interactions in a cell. The architecture of molecular networks can reveal important principles of cellular organization and function, similarly to the way that protein structure tells us about the function and organization of a protein. Protein complexes are groups of proteins that interact with each other at the same time and place, forming a single multimolecular machine. Functional modules, in contrast, consist of proteins that participate in a particular cellular process while binding each other at a different time and place.
A protein-protein interaction network is represented as proteins are nodes and interactions between proteins are edges. Protein complexes and functional modules can be identified as highly interconnected subgraphs and computational methods are now inevitable to detect them from protein interaction data. In addition, High-throughput screening techniques such as yeast two-hybrid screening enable identification of detailed protein-protein interactions map in multiple species. As the interaction dataset increases, the scale of interconnected protein networks increases exponentially so that the increasing complexity of network gives computational challenges to analyze the networks. Read the rest of this entry »
November 16th, 2010
From a recent announcement:
We are excited to announce the immediate availability of Cluster GPU Instances for Amazon EC2, a new instance type designed to deliver the power of GPU processing in the cloud. GPUs are increasingly being used to accelerate the performance of many general purpose computing problems. However, for many organizations, GPU processing has been out of reach due to the unique infrastructural challenges and high cost of the technology. Amazon Cluster GPU Instances remove this barrier by providing developers and businesses immediate access to the highly tuned compute performance of GPUs with no upfront investment or long-term commitment.
Learn more about the new Cluster GPU instances for Amazon EC2 and their use in running HPC applications.
Also, community support is becoming available; see for instance this blog post about SCG-Ruby on EC2 instances.
November 16th, 2010
Graphics processing units (GPUs) have traditionally been used in molecular modeling solely for visualization of molecular structures and animation of trajectories resulting from molecular dynamics simulations. Modern GPUs have evolved into fully programmable, massively parallel co-processors that can now be exploited to accelerate many scientific computations, typically providing about one order of magnitude speedup over CPU code and in special cases providing speedups of two orders of magnitude. This paper surveys the development of molecular modeling algorithms that leverage GPU computing, the advances already made and remaining issues to be resolved, and the continuing evolution of GPU technology that promises to become even more useful to molecular modeling. Hardware acceleration with commodity GPUs is expected to benefit the overall computational biology community by bringing teraflops performance to desktop workstations and in some cases potentially changing what were formerly batch-mode computational jobs into interactive tasks.
John E. Stone, David J. Hardy, Ivan S. Ufimtsev, and Klaus Schulten: “GPU-Accelerated Molecular Modeling Coming of Age”, Journal of Molecular Graphics and Modelling, Volume 29, Issue 2, September 2010, Pages 116-125. [DOI])
The emergence of Graphics Processing Units (GPUs) as a potential alternative to conventional general-purpose processors has led to significant interest in these architectures by both the academic community and the High Performance Computing (HPC) industry. While GPUs look likely to deliver unparalleled levels of performance, the publication of studies claiming performance improvements in excess of 30,000x are misleading. Significant on-node performance improvements have been demonstrated for code kernels and algorithms amenable to GPU acceleration; studies demonstrating comparable results for full scientific applications requiring multiple-GPU architectures are rare.
In this paper we present an analysis of a port of the NAS LU benchmark to NVIDIA’s Compute Unified Device Architecture (CUDA) – the most stable GPU programming model currently available. Our solution is also extended to multiple nodes and multiple GPU devices.
Runtime performance on several GPUs is presented, ranging from low-end, consumer-grade cards such as the 8400GS to NVIDIA’s flagship Fermi HPC processor found in the recently released C2050. We compare the runtimes of these devices to several processors including those from Intel, AMD and IBM.
In addition to this we utilise a recently developed performance model of LU. With this we predict the runtime performance of LU on large-scale distributed GPU clusters, which are predicted to become commonplace in future high-end HPC architectural solutions.
(S.J. Pennycook, S.D. Harmond, S.A. Jarvis and G.R. Mudalige: “Implementation of the NAS-LU Benchmark”, 1st International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems (PMBS 10), held as part of Supercomputing 2010 (SC’10), New Orleans, LA, USA. [PDF])