A GPGPU transparent virtualization component for high performance computing clouds

October 4th, 2010


The promise of exascale computing power is enforced by the many core technology, that involves all purpose CPUs and specialized computing devices, such as FPGA, DSP and GPUs. In particular GPUs, due also to their wide market footprint, have currently achieved one of the best core/cost rate in that category. Relying to some APIs provided by GPU vendors, the use of GPUs as general purpose massive parallel computing device (GPGPUs) is now routinely carried out in the scientific community. The increasing number of CPUs cores on chip has driven the development and spreading of the cloud computing, leveraging on consolidated technologies such as, but not limited to, grid computing and virtualization. In recent years the use of grid computing in high performance demanding applications in e-science has become a common issue. Elastic computer power and storage provided by a cloud infrastructure may be attractive but it is still limited by poor communication performance and lack of support in using GPGPUs within a virtual machine instance. The GPU Virtualization Service (gVirtuS) presented in this work tries to fill the gap between in-house hosted computing clusters, equipped with GPGPUs devices, and pay-for-use high performance virtual clusters deployed via public or private computing clouds. gVirtuS allows an instanced virtual machine to access GPGPUs in a transparent way, with an overhead slightly greater than a real machine/GPGPU setup. gVirtuS is hypervisor independent, and, even though it currently virtualizes nVIDIA CUDA based GPUs, it is not limited to a specific brand technology. The performance of the components of gVirtuS is assessed through a suite of tests in different deployment scenarios, such as providing GPGPU power to cloud computing based HPC clusters and sharing remotely hosted GPGPUs among HPC nodes.

(Giunta G., R. Montella, G. Agrillo, and G. Coviello: “A GPGPU transparent virtualization component for high performance computing clouds”. In P. D’Ambra, M. Guarracino, and D. Talia, editors, Euro-Par 2010 – Parallel Processing, volume 6271 of Lecture Notes in Computer Science, chapter 37, pages 379-391. Springer Berlin / Heidelberg, 2010. DOI. Link to project webpage with source code.)

CfP: New Frontiers in High-performance and Hardware-aware Computing (HipHaC’11)

September 30th, 2010

The Second International Workshop on New Frontiers in High-performance and Hardware-aware Computing (HipHaC’11) is to be held in conjunction with the 17th IEEE International Symposium on High-Performance Computer Architecture (HPCA-17), colocated with 16th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP 2011), February 13, 2011, San Antonio, Texas, USA.

This workshop aims at combining new aspects of parallel, heterogeneous, and reconfigurable microprocessor technologies with concepts of high-performance computing and, particularly, numerical solution methods. Topics of interest for workshop submissions include (but are not limited to): Read the rest of this entry »

BehaveRT: A GPU-Based Library for Autonomous Characters

September 27th, 2010


In this work, we present a GPU-based library, called BehaveRT, for the definition, real-time simulation, and visualization of large communities of individuals. We implemented a modular flexible and extensible architecture based on a plug-in infrastructure that enables the creation of a behavior engine system core. We used Compute Unified Device Architecture to perform parallel programming and specific memory optimization techniques to exploit the computational power of commodity graphics hardware, enabling developers to focus on the design and implementation of behavioral models. This paper illustrates the architecture of BehaveRT, the core plug-ins, and some case studies. In particular, we show two high-level behavioral models, picture and shape flocking, that generate images and shapes in 3D space by coordinating the positions and color-coding of individuals. We, then, present an environment discretization case study of the interaction of a community with generic virtual scenes such as irregular terrains and buildings.

(Ugo Erra, Bernardino Frola and Vittorio Scarano: “BehaveRT: A GPU-Based Library for Autonomous Characters”. MIG 2010, LNCS 6459, pp. 194-205,  Springer. Link to project webpage with code, images and videos.)

Efficient High-Quality Volume Rendering of SPH Data

September 27th, 2010

Efficient High-Quality Volume Rendering of SPH DataAbstract:

High quality volume rendering of SPH data requires a complex order-dependent resampling of particle quantities along the view rays. In this paper we present an efficient approach to perform this task using a novel view-space discretization of the simulation domain. Our method draws upon recent work on GPU-based particle voxelization for the efficient resampling of particles into uniform grids. We propose a new technique that leverages a perspective grid to adaptively discretize the view-volume, giving rise to a continuous level-of-detail sampling structure and reducing memory requirements compared to a uniform grid. In combination with a level-of-detail representation of the particle set, the perspective grid allows effectively reducing the amount of primitives to be processed at run-time. We demonstrate the quality and performance of our method for the rendering of fluid and gas dynamics SPH simulations consisting of many millions of particles.

(Roland Fraedrich, Stefan Auer, and Rüdiger Westermann: “Efficient High-Quality Volume Rendering of SPH Data”, IEEE Transactions on Visualization and Computer Graphics (Proceedings of IEEE Visualization 2010), vol. 16, no. 6, Nov.-Dec. 2010, Link to project webpage including paper, pictures and video)

Fast and accurate protein substructure searching with simulated annealing and GPUs

September 19th, 2010


Searching a database of protein structures for matches to a query structure, or occurrences of a structural motif, is an important task in structural biology and bioinformatics. While there are many existing methods for structural similarity searching, faster and more accurate approaches are still required, and few current methods are capable of substructure (motif) searching.

We developed an improved heuristic for tableau-based protein structure and substructure searching using simulated annealing, that is as fast or faster, and comparable in accuracy, with some widely used existing methods. Furthermore, we created a parallel implementation on a modern graphics processing unit (GPU). The GPU implementation achieves up to 34 times speedup over the CPU implementation of tableau-based structure search with simulated annealing, making it one of the fastest available methods. To the best of our knowledge, this is the first application of a GPU to the protein structural search problem.

(Stivala, A. and Stuckey, P. and Wirth, A.: “Fast and accurate protein substructure searching with simulated annealing and GPUs”. BMC Bioinformatics, 11:446, Sep. 2010, DOI)

MOSIX Virtual OpenCL (VCL)

September 13th, 2010

The MOSIX group announces the availability of the first release of the MOSIX Virtual OpenCL (VCL) package, which allows OpenCL applications to transparently utilize many GPU devices in clusters. In the VCL run-time environment all the cluster devices are seen as if they are located in each hosting-node – applications need not be aware which nodes and devices are available and where the devices are located. As such, VCL benefits OpenCL applications that can use multiple devices concurrently.

VCL can be used to build powerful parallel GPU based clusters from low-cost multi-core hosting nodes that can utilize cluster-wide (CPU and GPU) resources transparently.

The main features of VCL are: Read the rest of this entry »

Database Compression on Graphics Processors

September 11th, 2010


Query co-processing on graphics processors (GPUs) has become an effective means to improve the performance of main memory databases. However, this co-processing requires the data transfer between the main memory and the GPU memory via a low-bandwidth PCI-E bus. The overhead of such data transfer becomes an important factor, even a bottleneck, for query co-processing performance on the GPU. In this paper, we propose to use compression to alleviate this performance problem. Specifically, we implement nine lightweight compression schemes on the GPU and further study the combinations of these schemes for a better compression ratio. We design a compression planner to find the optimal combination. Our experiments demonstrate that the GPU-based compression and decompression achieved a processing speed up to 45 and 56 GB/s respectively. Using partial decompression, we were able to significantly improve GPU-based query co-processing performance. As a side product, we have integrated our GPU-based compression into MonetDB, an open source column-oriented DBMS, and demonstrated the feasibility of offloading compression and decompression to the GPU.

(Wenbin Fang, Bingsheng He, Qiong Luo: “Database Compression on Graphics Processors”, PVLDB/VLDB 2010. Link to PDF.)

PacketShader: A GPU-Accelerated Software Router

September 6th, 2010


We present PacketShader, a high-performance software router framework for general packet processing with Graphics Processing Unit (GPU) acceleration. PacketShader exploits the massively-parallel processing power of GPU to address the CPU bottleneck in current software routers. Combined with our high-performance packet I/O engine, PacketShader outperforms existing software routers by more than a factor of four, forwarding 64B IPv4 packets at 39 Gbps on a single commodity PC. We have implemented IPv4 and IPv6 forwarding, OpenFlow switching, and IPsec tunneling to demonstrate the flexibility and performance advantage of PacketShader. The evaluation results show that GPU brings significantly higher throughput over the CPU-only implementation, confirming the effectiveness of GPU for computation and memory-intensive operations in packet processing.

(Sangjin Han, Keon Jang, KyoungSoo Park and Sue Moon: “PacketShader: A GPU-accelerated Software Router”, Proceedings of ACM SIGCOMM 2010, Delhi, India, September 2010. Project webpage. DOI)

GPU-Based Speculative Query Processing for Database Operations

September 5th, 2010


With an increasing amount of data and user demands for fast query processing, the optimization of database operations continues to be a challenging task. A common optimization method is to leverage parallel hardware architectures. With the introduction of general-purpose GPU computing, massively parallel hardware has become available within commodity hardware. To efficiently exploit this technology, we introduce the method of speculative query processing. This speculative query processing works on, but is not limited to, a prefix tree structure to efficiently support heavily used database index operations. Fundamentally, our developed approach traverse a prefix tree structure in a speculative, parallel way instead of a step-by-step traversing. To show the benefits and opportunities of our novel approach, we present an exhaustive evaluation on a graphical processing unit.

(Volk, P. B.; Habich, D.; Lehner, W.: “GPU-Based Speculative Query Processing for Database Operations”. First International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures (ADMS’10), held in conjunction with VLDB 2010, September 2010. Link to project webpage.)

Multi-GPU accelerated multi-spin Monte Carlo simulations of the 2D Ising model

August 1st, 2010


A Modern Graphics Processing unit (GPU) is able to perform massively parallel scientific computations at low cost. We extend our implementation of the checkerboard algorithm for the two-dimensional Ising model in order to overcome the memory limitations of a single GPU which enables us to simulate significantly larger systems. Using multi-spin coding techniques, we are able to accelerate simulations on a single GPU by factors up to 35 compared to an optimized single Central Processor Unit (CPU) core implementation which employs multi-spin coding. By combining the Compute Unified Device Architecture (CUDA) with the Message Parsing Interface (MPI) on the CPU level, a single Ising lattice can be updated by a cluster of GPUs in parallel. For large systems, the computation time scales nearly linearly with the number of GPUs used. As proof of concept we reproduce the critical temperature of the 2D Ising model using finite size scaling techniques.

(Benjamin Block, Peter Virnau and Tobias Preis: “Multi-GPU accelerated multi-spin Monte Carlo simulations of the 2D Ising model”, Computer Physics Communications 181:9, 1549-1556, Sep. 2010. DOI Link. arXiv link)

Page 20 of 56« First...10...1819202122...304050...Last »