State of GPU Virtualization for CUDA Applications 2014

August 14th, 2014

This blog entry provides an introduction to GPU virtualization, reviewing the five major technology vendors and their virtualization support for CUDA.

rCUDA 4.2 version available

June 19th, 2014

A new version of the rCUDA middleware has been released (version 4.2). In addition to fix some minor bugs, the new release provides support for:

  • CUDA 6.0 Runtime API
  • New stream management
  • cuSPARSE libraries

The rCUDA middleware allows to seamlessly use, within your cluster, GPUs that are installed in computing nodes different from the one that is executing the CUDA application, without requiring to modify your program. Please visit www.rcuda.net for more details about the rCUDA technology.

New rCUDA 4.1 version available

March 26th, 2014

A new version of the rCUDA middleware has been released (version 4.1). In addition to fix some bugs related with asynchronous memory transfers, the new release provides support for:

  • CUDA 5.5 Runtime API
  • Mellanox Connect-IB network adapters
  • Dynamic Parallelism
  • cuFFT and cuBLAS libraries

The rCUDA middleware allows to seamlessly use, within your cluster, GPUs that are installed in computing nodes different from the one that is executing the CUDA application, without requiring to modify nor recompile your program. Please visit www.rcuda.net for more details about the rCUDA technology.

Influence of InfiniBand FDR on the Performance of Remote GPU Virtualization

October 7th, 2013

Abstract:

The use of GPUs to accelerate general-purpose scientific and engineering applications is mainstream today, but their adoption in current high-performance computing clusters is impaired primarily by acquisition costs and power consumption. Therefore, the benefits of sharing a reduced number of GPUs among all the nodes of a cluster can be remarkable for many applications. This approach, usually referred to as remote GPU virtualization, aims at reducing the number of GPUs present in a cluster, while increasing their utilization rate. The performance of the interconnection network is key to achieving reasonable performance results by means of remote GPU virtualization. To this end, several networking technologies with throughput comparable to that of PCI Express have appeared recently. In this paper we analyze the influence of InfiniBand FDR on the performance of remote GPU virtualization, comparing its impact on a variety of GPU-accelerated applications with other networking technologies, such as InfiniBand QDR and Gigabit Ethernet. Given the severe limitations of freely available remote GPU virtualization solutions, the rCUDA framework is used as the case study for this analysis. Results show that the new FDR interconnect, featuring higher bandwidth than its predecessors, allows the reduction of the overhead of using GPUs remotely, thus making this approach even more appealing.

(Carlos Reano, Rafael Mayo, Enrique S. Quintana-Ortí, Federico Silla, José Duato and Antonio J. Pena: “Influence of InfiniBand FDR on the Performance of Remote GPU Virtualization”. Proceedings of the IEEE Cluster 2013 Conference, Indianapolis, USA, September 2013. [PDF])

rCUDA now available for the ARM architecture

July 26th, 2013

The rCUDA team is glad to announce that its remote GPU virtualization technology now supports the ARM processor architecture. The new release of rCUDA for this low-power processor has been developed for the Ubuntu 11.04 and Ubuntu 12.04 ARM linux distributions. With this new rCUDA release, it is also possible to leverage hybrid platforms where the application uses ARM CPUs while requesting acceleration services provided by remote GPUs installed in x86 nodes. The opposite is also possible: an application running in an x86 computer can access remote GPUs attached to ARM systems. Please visit rCUDA website for more information or for requesting a free copy of the rCUDA middleware.

rCUDA 3.0a released

July 17th, 2011

A new alpha release of rCUDA 3.0 (Remote CUDA), the Open Source package that allows performing CUDA calls to remote GPUs, has been released. Major improvements included in this new version are:

  • Partially updated API to 4.0
  • Added compatibility support with CUDA 4.0 environment
  • Updated CUBLAS API to 4.0 for the most common CUBLAS routines
  • Fixed some bugs
  • General performance improvements

For further information, please visit the rCUDA webpage.

A GPGPU transparent virtualization component for high performance computing clouds

October 4th, 2010

Abstract:

The promise of exascale computing power is enforced by the many core technology, that involves all purpose CPUs and specialized computing devices, such as FPGA, DSP and GPUs. In particular GPUs, due also to their wide market footprint, have currently achieved one of the best core/cost rate in that category. Relying to some APIs provided by GPU vendors, the use of GPUs as general purpose massive parallel computing device (GPGPUs) is now routinely carried out in the scientific community. The increasing number of CPUs cores on chip has driven the development and spreading of the cloud computing, leveraging on consolidated technologies such as, but not limited to, grid computing and virtualization. In recent years the use of grid computing in high performance demanding applications in e-science has become a common issue. Elastic computer power and storage provided by a cloud infrastructure may be attractive but it is still limited by poor communication performance and lack of support in using GPGPUs within a virtual machine instance. The GPU Virtualization Service (gVirtuS) presented in this work tries to fill the gap between in-house hosted computing clusters, equipped with GPGPUs devices, and pay-for-use high performance virtual clusters deployed via public or private computing clouds. gVirtuS allows an instanced virtual machine to access GPGPUs in a transparent way, with an overhead slightly greater than a real machine/GPGPU setup. gVirtuS is hypervisor independent, and, even though it currently virtualizes nVIDIA CUDA based GPUs, it is not limited to a specific brand technology. The performance of the components of gVirtuS is assessed through a suite of tests in different deployment scenarios, such as providing GPGPU power to cloud computing based HPC clusters and sharing remotely hosted GPGPUs among HPC nodes.

(Giunta G., R. Montella, G. Agrillo, and G. Coviello: “A GPGPU transparent virtualization component for high performance computing clouds”. In P. D’Ambra, M. Guarracino, and D. Talia, editors, Euro-Par 2010 – Parallel Processing, volume 6271 of Lecture Notes in Computer Science, chapter 37, pages 379-391. Springer Berlin / Heidelberg, 2010. DOI. Link to project webpage with source code.)