Alenka – A GPU database engine including compression

November 28th, 2011

Support for several types of compression has been added to the GPU-based database engine ålenkå . Supported algorithms include FOR (frame of reference), FOR-DELTA and dictionary compression. All compression algorithms run on the GPU achieving gigabytes per second compression and decompression speed. The use of compression allows to significantly reduce or eliminate I/O bottlenecks in analytical queries as shown by ålenkå’s results in the Star Schema and TPC-H benchmarks.

Alenka – SQL for CUDA

May 11th, 2011

Alenka is a columnar SQL-like language for data processing on CUDA hardware. Alenka uses vector based processing to perform SQL operations like joins, groups and sorts. The program is capable of processing very large data sets that do not fit into GPU or host memory: such sets are partitioned into pieces and processed separately. Get it here: https://sourceforge.net/projects/alenka/files/

GID2011 Sumbmission deadline April 22nd

April 13th, 2011

The deadline for submissions to “GPU’s in Databases” GID2011 workshop has been extended [ed: again...] to April 22nd, 2011. The “GPUs in Databases” workshop is devoted to sharing the knowledge related to applying GPUs in database environments and to discuss possible future development of this application domain.  See our previous post for details.

GID2011 Deadline Extension

March 29th, 2011

The deadline for submissions to “GPU’s in Databases” GID2011 workshop has been extended to April 12th, 2011. The “GPUs in Databases” workshop is devoted to sharing the knowledge related to applying GPUs in database environments and to discuss possible future development of this application domain. The workshop topics include, but are not limited to: Read the rest of this entry »

CfP: First ADBIS workshop on GPUs in Databases (GID 2011)

December 22nd, 2010

GiD LogoThe “GPUs in Databases” workshop is devoted to sharing the knowledge related to applying GPUs in database environments and to discuss possible future development of this application domain. The workshop topics include, but are not limited to:

  • GPU based data compression (lossless/lossy compression and decompression, real time compression and decompression of multimedia)
  • GPUs in databases and data warehouses (join processing, data indexing, data aggregation, bulk query processing, analytical query processing)
  • Data mining using GPUs (classification, frequent itemsets and association rules, frequent subgraphs, sequential patterns, clustering, social networks mining, regression)
  • GPUs in streaming databases (query processing in streaming databases, stream compression/decompression)
  • Applications of GPUs in bioinformatics

The workshop will take place on September 19th, 2011 and is co-located with ADBIS 2011 in Vienna, Austria. Submissions are due April 5th, 2011. All of accepted submissions will be published in CEUR workshop proceedings and the best papers will also be published in Lecture Notes in Computer Science and Foundations of Computing and Decision Sciences.

More detailed information can be found at the workshop website http://gid2011.cs.put.poznan.pl.

Database Compression on Graphics Processors

September 11th, 2010

Abstract:

Query co-processing on graphics processors (GPUs) has become an effective means to improve the performance of main memory databases. However, this co-processing requires the data transfer between the main memory and the GPU memory via a low-bandwidth PCI-E bus. The overhead of such data transfer becomes an important factor, even a bottleneck, for query co-processing performance on the GPU. In this paper, we propose to use compression to alleviate this performance problem. Specifically, we implement nine lightweight compression schemes on the GPU and further study the combinations of these schemes for a better compression ratio. We design a compression planner to find the optimal combination. Our experiments demonstrate that the GPU-based compression and decompression achieved a processing speed up to 45 and 56 GB/s respectively. Using partial decompression, we were able to significantly improve GPU-based query co-processing performance. As a side product, we have integrated our GPU-based compression into MonetDB, an open source column-oriented DBMS, and demonstrated the feasibility of offloading compression and decompression to the GPU.

(Wenbin Fang, Bingsheng He, Qiong Luo: “Database Compression on Graphics Processors”, PVLDB/VLDB 2010. Link to PDF.)

GPU-Based Speculative Query Processing for Database Operations

September 5th, 2010

Abstract:

With an increasing amount of data and user demands for fast query processing, the optimization of database operations continues to be a challenging task. A common optimization method is to leverage parallel hardware architectures. With the introduction of general-purpose GPU computing, massively parallel hardware has become available within commodity hardware. To efficiently exploit this technology, we introduce the method of speculative query processing. This speculative query processing works on, but is not limited to, a prefix tree structure to efficiently support heavily used database index operations. Fundamentally, our developed approach traverse a prefix tree structure in a speculative, parallel way instead of a step-by-step traversing. To show the benefits and opportunities of our novel approach, we present an exhaustive evaluation on a graphical processing unit.

(Volk, P. B.; Habich, D.; Lehner, W.: “GPU-Based Speculative Query Processing for Database Operations”. First International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures (ADMS’10), held in conjunction with VLDB 2010, September 2010. Link to project webpage.)

A Fast Similarity Join Algorithm Using Graphics Processing Units

May 25th, 2008

This paper by Lieberman et al. at the University of Maryland describes an application of GPU processing to the similarity join, a common operation in spatial databases. A similarity join takes two sets of points A, B and returns pairs pA, qB where the distance D(p,q) ≤ ε. The similarity join is a common spatial database operation with many applications. An algorithm named LSS is presented that executes on a GPU, taking advantage of the GPU’s parallelism and large data throughput. To achieve peak efficiency, LSS relies only on simple primitive operations that execute quickly on the GPU, such as the sorting and searching of arrays. It recasts the similarity join as a sort-and-search problem by mapping its input datasets onto a set of space-filling curves, generated by a parallel sort routine on the GPU. It then searches small intervals of these curves that are guaranteed to contain all pairs of the correct result. LSS offers a balance between time and work efficiencies and is shown to perform well when compared against existing prominent high-dimensional similarity join methods. (M. D. Lieberman, J. Sankaranarayanan, and H. Samet. A fast similarity join algorithm using graphics processing units. In Proceedings of the 24th IEEE International Conference on Data Engineering, pages 1111-1120, Cancun, Mexico, April 2008.)

Relational Joins on Graphics Processors

April 2nd, 2008

Abstract: “We present a novel design and implementation of relational join algorithms for new-generation graphics processing units (GPUs). Taking advantage of GPU features, we design a set of data-parallel primitives such as split and sort, and use these primitives to implement indexed or non-indexed nested-loop, sort-merge and hash joins. Our algorithms utilize the high parallelism as well as the high memory bandwidth of the GPU, and use parallel computation and memory optimizations to effectively reduce memory stalls. We have implemented our algorithms on a PC with an NVIDIA G80 GPU and an Intel quad-core CPU. Our GPU-based join algorithms are able to achieve a performance improvement of 2-7X over their optimized CPU-based counterparts. (Bingsheng He, Ke Yang, Rui Fang, Mian Lu, Naga K. Govindaraju, Qiong Luo, and Pedro V. Sander. Relational Joins on Graphics Processors. ACM SIGMOD 2008.)

GPUTeraSort: High Performance Graphics Coprocessor Sorting for Large Database Management

April 4th, 2006

GPUTeraSort sorts billion-record wide-key databases using the data and task parallelism on the graphics processing unit (GPU) to perform memory-intensive and compute-intensive tasks while the CPU performs I/O and resource management. It exploits both the high-bandwidth GPU memory interface and the lower-bandwidth CPU main memory interface to achieve higher aggregate memory bandwidth than purely CPU-based algorithms. It also pipelines disk transfers to achieve near-peak I/O performance. GPUTera-Sort is a two-phase task pipeline: (1) read disk, build keys, sort using the GPU, generate runs, write disk, and (2) read, merge, write. We tested the performance of GPUTeraSort on billion-record files using the standard Sort benchmark. In practice, a 3 GHz Pentium IV PC with $265 NVIDIA 7800 GT GPU is significantly faster than optimized CPU-based algorithms on much faster processors, sorting 60GB for a penny; the best reported PennySort price-performance. These results suggest that a GPU co-processor can significantly improve performance on large data processing tasks. (GPUTeraSort: High Performance Graphics Coprocessor Sorting for Large Database Management. Naga K. Govindaraju, Jim Gray, Ritesh Kumar, and Dinesh Manocha. Proceedings of ACM SIGMOD 2006.)

Page 1 of 212