The HLRS / University of Stuttgart, Germany, hosts a CRAY workshop on hybrid manycore computing and accelerators in high performance computing, on Monday Oct. 25, 2010. The number of participants is limited to 30, and no fee is charged. Registration, the full workshop program, and more information is available at the workshop website, http://corga.hlrs.de/corga/corga-CrayGPU-2010.
Course abstract: Read the rest of this entry »
SpeedGo Computing recently announced their development of CUDA bindings for Ruby. Currently, only part of the CUDA Driver API is included. More components such as the CUDA Runtime API will be added to make it as complete as possible. More details as well as sample code can be found in this blog post.
High quality volume rendering of SPH data requires a complex order-dependent resampling of particle quantities along the view rays. In this paper we present an efficient approach to perform this task using a novel view-space discretization of the simulation domain. Our method draws upon recent work on GPU-based particle voxelization for the efficient resampling of particles into uniform grids. We propose a new technique that leverages a perspective grid to adaptively discretize the view-volume, giving rise to a continuous level-of-detail sampling structure and reducing memory requirements compared to a uniform grid. In combination with a level-of-detail representation of the particle set, the perspective grid allows effectively reducing the amount of primitives to be processed at run-time. We demonstrate the quality and performance of our method for the rendering of fluid and gas dynamics SPH simulations consisting of many millions of particles.
(Roland Fraedrich, Stefan Auer, and Rüdiger Westermann: “Efficient High-Quality Volume Rendering of SPH Data”, IEEE Transactions on Visualization and Computer Graphics (Proceedings of IEEE Visualization 2010), vol. 16, no. 6, Nov.-Dec. 2010, Link to project webpage including paper, pictures and video)
Searching a database of protein structures for matches to a query structure, or occurrences of a structural motif, is an important task in structural biology and bioinformatics. While there are many existing methods for structural similarity searching, faster and more accurate approaches are still required, and few current methods are capable of substructure (motif) searching.
We developed an improved heuristic for tableau-based protein structure and substructure searching using simulated annealing, that is as fast or faster, and comparable in accuracy, with some widely used existing methods. Furthermore, we created a parallel implementation on a modern graphics processing unit (GPU). The GPU implementation achieves up to 34 times speedup over the CPU implementation of tableau-based structure search with simulated annealing, making it one of the fastest available methods. To the best of our knowledge, this is the first application of a GPU to the protein structural search problem.
(Stivala, A. and Stuckey, P. and Wirth, A.: “Fast and accurate protein substructure searching with simulated annealing and GPUs”. BMC Bioinformatics, 11:446, Sep. 2010, DOI)
The MOSIX group announces the availability of the first release of the MOSIX Virtual OpenCL (VCL) package, which allows OpenCL applications to transparently utilize many GPU devices in clusters. In the VCL run-time environment all the cluster devices are seen as if they are located in each hosting-node – applications need not be aware which nodes and devices are available and where the devices are located. As such, VCL benefits OpenCL applications that can use multiple devices concurrently.
VCL can be used to build powerful parallel GPU based clusters from low-cost multi-core hosting nodes that can utilize cluster-wide (CPU and GPU) resources transparently.
The main features of VCL are: Read the rest of this entry »
Query co-processing on graphics processors (GPUs) has become an effective means to improve the performance of main memory databases. However, this co-processing requires the data transfer between the main memory and the GPU memory via a low-bandwidth PCI-E bus. The overhead of such data transfer becomes an important factor, even a bottleneck, for query co-processing performance on the GPU. In this paper, we propose to use compression to alleviate this performance problem. Specifically, we implement nine lightweight compression schemes on the GPU and further study the combinations of these schemes for a better compression ratio. We design a compression planner to find the optimal combination. Our experiments demonstrate that the GPU-based compression and decompression achieved a processing speed up to 45 and 56 GB/s respectively. Using partial decompression, we were able to significantly improve GPU-based query co-processing performance. As a side product, we have integrated our GPU-based compression into MonetDB, an open source column-oriented DBMS, and demonstrated the feasibility of offloading compression and decompression to the GPU.
(Wenbin Fang, Bingsheng He, Qiong Luo: “Database Compression on Graphics Processors”, PVLDB/VLDB 2010. Link to PDF.)
This workshop is organized by the Pan-American Advanced Studies Institute (PASI). Thanks to NSF and DOE funding, there is travel support for up to 30 graduate students and postdoctoral fellows to attend, from the US and the rest of the Americas. Applications should be made online until October 1st.
- David Keyes, Columbia University and KAUST
- Tsuyoshi Hamada, Tokyo Institute of Technology
Lecturers (confirmed): Read the rest of this entry »
OpenNL (Open Numerical Library) is a library for solving sparse linear systems on CPUs and GPUs. Features include various preconditioned Krylov subspace solvers for several data structures. The library is explicitly designed for easy interfacing with existing codes and their storage schemes.
Highlights of version 3.2.1 include:
- Support for double precision on the GPU
- Support for the Fermi architecture
We present PacketShader, a high-performance software router framework for general packet processing with Graphics Processing Unit (GPU) acceleration. PacketShader exploits the massively-parallel processing power of GPU to address the CPU bottleneck in current software routers. Combined with our high-performance packet I/O engine, PacketShader outperforms existing software routers by more than a factor of four, forwarding 64B IPv4 packets at 39 Gbps on a single commodity PC. We have implemented IPv4 and IPv6 forwarding, OpenFlow switching, and IPsec tunneling to demonstrate the flexibility and performance advantage of PacketShader. The evaluation results show that GPU brings significantly higher throughput over the CPU-only implementation, confirming the effectiveness of GPU for computation and memory-intensive operations in packet processing.
(Sangjin Han, Keon Jang, KyoungSoo Park and Sue Moon: “PacketShader: A GPU-accelerated Software Router”, Proceedings of ACM SIGCOMM 2010, Delhi, India, September 2010. Project webpage. DOI)
With an increasing amount of data and user demands for fast query processing, the optimization of database operations continues to be a challenging task. A common optimization method is to leverage parallel hardware architectures. With the introduction of general-purpose GPU computing, massively parallel hardware has become available within commodity hardware. To efficiently exploit this technology, we introduce the method of speculative query processing. This speculative query processing works on, but is not limited to, a prefix tree structure to efficiently support heavily used database index operations. Fundamentally, our developed approach traverse a prefix tree structure in a speculative, parallel way instead of a step-by-step traversing. To show the benefits and opportunities of our novel approach, we present an exhaustive evaluation on a graphical processing unit.
(Volk, P. B.; Habich, D.; Lehner, W.: “GPU-Based Speculative Query Processing for Database Operations”. First International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures (ADMS’10), held in conjunction with VLDB 2010, September 2010. Link to project webpage.)