You are here: Home » Archives for Clusters
December 14th, 2011
Abstract:
In this paper we investigate the use of distributed graphics processing unit (GPU)-based architectures to accelerate pipelined wavefront applications—a ubiquitous class of parallel algorithms used for the solution of a number of scientific and engineering applications. Specifically, we employ a recently developed port of the LU solver (from the NAS Parallel Benchmark suite) to investigate the performance of these algorithms on high-performance computing solutions from NVIDIA (Tesla C1060 and C2050) as well as on traditional clusters (AMD/InfiniBand and IBM BlueGene/P).
Benchmark results are presented for problem classes A to C and a recently developed performance model is used to provide projections for problem classes D and E, the latter of which represents a billion-cell problem. Our results demonstrate that while the theoretical performance of GPU solutions will far exceed those of many traditional technologies, the sustained application performance is currently comparable for scientific wavefront applications. Finally, a breakdown of the GPU solution is conducted, exposing PCIe overheads and decomposition constraints. A new k-blocking strategy is proposed to improve the future performance of this class of algorithm on GPU-based architectures.
(Pennycook, S.J., Hammond, S.D., Mudalige, G.R., Wright, S.A. and Jarvis, S.A.: “On the Acceleration of Wavefront Applications using Distributed Many-Core Architectures”, The Computer Journal (in press) [DOI] [PREPRINT])
Posted in Research | Tags: Clusters, High-Performance Computing, Linear Algebra, NVIDIA CUDA, Papers | Write a comment
October 20th, 2011
The new version 3.1 of rCUDA (Remote CUDA), the Open Source package that allows performing CUDA calls to remote GPUs, is now available. Release highlights:
- Fully updated API to CUDA 4.0 (added support for modules “Peer Device Memory Access” and “Unified Addressing”).
- Fixed low level Surface Reference management functions.
For further information, please visit the rCUDA webpage at http://www.gap.upv.es/rCUDA.
Posted in Developer Resources | Tags: Clusters, High-Performance Computing, Libraries, NVIDIA CUDA, Open Source | 2 Comments
October 19th, 2011
A paper detailing several possible avenues to expand MPI to accelerators has just been presented at “Architectures and System for Big Data (ASBD) 2011″, a workshop at PACT 2011. The abstract and a link to the paper are both below. We (the authors) are looking for feedback as to which options seem attractive to GPU programmers and developers. We welcome any comments/thoughts/critiques you might have.
Current trends in computing and system architecture point towards a need for accelerators such as GPUs to have inherent communication capabilities. We review previous and current software libraries that provide pseudo-communication abilities through direct message passing. We show how these libraries are beneficial to the HPC community, but are not forward-thinking enough. We give motivation as to why MPI should be extended to support these accelerators, and provide a road map of achievable milestones to complete such an extension, some of which require advances in hardware and device drivers.
(Jeff A. Stuart, Pavan Balaji and John D. Owens, “Extending MPI to Accelerators”, PACT 2011 Workshop Series: Architectures and Systems for Big Data, October 2011. [WWW])
Posted in Research | Tags: Clusters, MPI, NVIDIA GPUdirect, Papers | Write a comment
July 17th, 2011
A new alpha release of rCUDA 3.0 (Remote CUDA), the Open Source package that allows performing CUDA calls to remote GPUs, has been released. Major improvements included in this new version are:
- Partially updated API to 4.0
- Added compatibility support with CUDA 4.0 environment
- Updated CUBLAS API to 4.0 for the most common CUBLAS routines
- Fixed some bugs
- General performance improvements
For further information, please visit the rCUDA webpage.
Posted in Developer Resources | Tags: Clusters, High-Performance Computing, Libraries, NVIDIA CUDA, Virtualisation | Write a comment
June 14th, 2011
Computing in Science & Engineering seeks articles for a May/June 2012 special issue focusing on the use of GPUs in science and engineering applications. Contributions covering all aspects of using GPUs for solving challenging computational science problems are welcome. Of special interest are articles presenting results of porting efforts of large-scale scientific applications on large-scale GPU-based high-performance computers. See the full call for papers for complete details.
Posted in Research | Tags: Call for Papers, Clusters, Computational Science and Engineering, Journals | Write a comment
November 27th, 2010
A new major release of rCUDA™ (Remote CUDA), the Open Source package that allows performing CUDA calls to remote GPUs, has been released. The major improvements included in the new version are:
- Updated API to 3.1
- Server now uses Runtime API when possible (CUDA >= 3.1 required)
- Introduced support for the most common CUBLAS routines
- Fixed some bugs
- Added AF_UNIX sockets support to enhance performance on local executions
- Added some load balancing capabilities to the server
- General performance improvements
- Officially added Fermi support
Further information is available from the rCUDA™ webpages http://www.gap.upv.es/rCUDA and http://www.hpca.uji.es/rCUDA.
Posted in Developer Resources | Tags: Clusters, Libraries, Multi-GPU, NVIDIA CUDA, Tools | Write a comment
November 22nd, 2010
From a recent announcement:
We are excited to announce the immediate availability of Cluster GPU Instances for Amazon EC2, a new instance type designed to deliver the power of GPU processing in the cloud. GPUs are increasingly being used to accelerate the performance of many general purpose computing problems. However, for many organizations, GPU processing has been out of reach due to the unique infrastructural challenges and high cost of the technology. Amazon Cluster GPU Instances remove this barrier by providing developers and businesses immediate access to the highly tuned compute performance of GPUs with no upfront investment or long-term commitment.
Learn more about the new Cluster GPU instances for Amazon EC2 and their use in running HPC applications.
Also, community support is becoming available; see for instance this blog post about SCG-Ruby on EC2 instances.
Posted in Business, Developer Resources | Tags: Cloud Computing, Clusters, High-Performance Computing, NVIDIA FERMI, On-Demand Computing | Write a comment
June 23rd, 2010
Abstract:
We implement a high-order finite-element application, which performs the numerical simulation of seismic wave propagation resulting for instance from earthquakes at the scale of a continent or from active seismic acquisition experiments in the oil industry, on a large cluster of NVIDIA Tesla graphics cards using the CUDA programming environment and non-blocking message passing based on MPI. Contrary to many finite-element implementations, ours is implemented successfully in single precision, maximizing the performance of current generation GPUs. We discuss the implementation and optimization of the code and compare it to an existing very optimized implementation in C language and MPI on a classical cluster of CPU nodes. We use mesh coloring to efficiently handle summation operations over degrees of freedom on an unstructured mesh, and non-blocking MPI messages in order to overlap the communications across the network and the data transfer to and from the device via PCIe with calculations on the GPU. We perform a number of numerical tests to validate the single-precision CUDA and MPI implementation and assess its accuracy. We then analyze performance measurements and depending on how the problem is mapped to the reference CPU cluster, we obtain a speedup of 20x or 12x.
(Dimitri Komatisch, Gordon Erlebacher, Dominik Göddeke and David Michéa: “High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster”, accepted for publication in: Journal of Computational Physics, Jun. 2010. PDF preprint. DOI link.)
Posted in Research | Tags: Clusters, Finite Element Methods, High-Performance Computing, NVIDIA CUDA, Papers, Scientific Computing | 1 Comment
June 1st, 2010
From a white paper by GE Intelligent Platforms (Link):
This white paper describes how GPGPU technology can allow system designers to fit an unprecedented amount of processing power into a very compact package. For example, it describes four GE Intelligent Platforms 3U VPX boards with a floating point performance of 766 GFLOPS in less than 0.4 cubic feet. With configuration control and lifecycle management from a leading COTS supplier, these technologies are clearly ready for duty.
Posted in Business | Tags: Clusters, System Infrastructure, systems | Write a comment
April 5th, 2010
The GAP (Universidad Politécnica de Valencia, Spain) and HPCA (Universidad Jaume I, Spain) research groups are proud to announce the public release of rCUDA 1.0. The rCUDA Framework enables the concurrent usage of CUDA-compatible devices remotely by employing the sockets API for communication between clients and servers. Thus, it can be useful in three different environments:
- Clusters. To reduce the number of GPUs installed in High Performance Clusters. This leads to energy savings, as well as other related savings like acquisition costs, maintenance, space, cooling, etc.
- Academia. In low performance networks, to offer access to a few high performance GPUs concurrently to all the students.
- Virtual Machines. To enable the access to the CUDA facilities on the physical machine.
The current version of rCUDA (v1.0) implements all functions in the CUDA Runtime API version 2.3, excluding OpenGL and Direct3D interoperability. rCUDA 1.0 targets the Linux OS (for 32- and 64-bit architectures) on both client and server sides. The framework is free for any purpose under the terms and conditions of the GNU GPL/LGPL (where applicable) licenses.
For additional information, visit the rCUDA web page or Antonio Peña’s webpage.
Posted in Developer Resources, Research | Tags: Clusters, Libraries, NVIDIA CUDA, Parallel Programming, Tools | Write a comment