Due to ever increasing demand for fast processing of large analytical workloads, main memory column-oriented databases have attracted a lot of attention in recent years. In-memory databases eliminate the disk I/O barrier by storing the data in memory. In addition, they utilize a column-oriented data layout to offer a multi-core-friendly and memory-bandwidth-efficient processing scheme. On the other hand, recently, graphics processing units (GPUs) have emerged as powerful tools for general high-performance computing. GPUs are affordable and energy-efficient devices that deliver a massive computational power by utilizing a large number of cores and a high memory bandwidth. GPUs can be used as co-processors for query acceleration of in-memory databases. One of the main bottlenecks in GPU-acceleration of in-memory databases is the need for data to be transferred back and forward between GPU memory and RAM through a low-bandwidth PCIe bus. To address this problem, in this study, a new generation of in-memory databases is proposed that instead of keeping data in main memory stores it in GPU device memory.
(Pedram Ghodsnia: “An In-GPU-Memory Column-Oriented Database for Processing Analytical Workloads”, VLDB 2012 PhD Workshop, Istanbul, Turkey, August 2012. [PDF])
High performance of modern Graphics Processing Units may be utilized not only for graphics related application but also for general computing. This computing power has been utilized in new variants of many algorithms from almost every computer science domain. Unfortunately, while other application domains strongly benefit from utilizing the GPUs, databases related applications seem not to get enough attention. The main goal of the GPUs in Databases (GID) workshop is to fill this gap. This event is devoted to sharing the knowledge related to applying GPUs in Database environments and to discuss possible future development of this application domain. The list of topics of the GID workshop includes (but is not limited to):
- Data compression on GPUs
- GPUs in databases and data warehouses
- Data mining using GPUs
- Stream processing
- Applications of GPUs in bioinformatics
- Data oriented GPU primitives
For details please visit gid.us.to.
This publication describes efficient low level algorithms for performing relational queries on parallel processors, such as NVIDIA Fermi or Kepler. Relations are stored in GPU memory as sorted arrays of tuples, and manipulated by relational operators that preserve the sorted property. Most significantly, this work introduces algorithms for JOIN and SET INTERSECTION/UNION/DIFFERENCE that can process data at over 50 GB/s.
Relational databases remain an important application domain for organizing and analyzing the massive volume of data generated as sensor technology, retail and inventory transactions, social media, computer vision, and new fields continue to evolve. At the same time, processor architectures are beginning to shift towards hierarchical and parallel architectures employing throughput-optimized memory systems, lightweight multi-threading, and Single-Instruction Multiple-Data (SIMD) core organizations. Examples include general purpose graphics processing units (GPUs) such as NVIDIA’s Fermi, Intels Sandy Bridge, and AMD’s Fusion processors. This paper explores the mapping of primitive relational algebra operations onto GPUs. In particular, we focus on algorithms and data structure design identifying a fundamental conflict between the structure of algorithms with good computational complexity and that of algorithms with memory access patterns and instruction schedules that achieve peak machine utilization. To reconcile this conflict, our design space exploration converges on a hybrid multi-stage algorithm that devotes a small amount of the total runtime to prune input data sets using an irregular algorithm with good computational complexity. The partial results are then fed into a regular algorithm that achieves near peak machine utilization. The design process leading to the most efficient algorithm for each stage is described, detailing alternative implementations, their performance characteristics, and an explanation of why they were ultimately abandoned. The least efficient algorithm (JOIN) achieves 57% − 72% of peak machine performance depending on the density of the input. The most efficient algorithms (PRODUCT, PROJECT, and SELECT) achieve 86% − 92% of peak machine performance across all input data sets. To the best of our knowledge, these represent the best known published results to date for any implementations. This work lays the foundation for the development of a relational database system that achieves good scalability on a Multi-Bulk-Synchronous-Parallel (M-BSP) processor architecture. Additionally, the algorithm design may offer insights into the design of parallel and distributed relational database systems. It leaves the problems of query planning, operator→query synthesis, corner case optimization, and system/OS interaction as future work that would be necessary for commercial deployment.
(Gregory Diamos, Ashwin Lele, Jin Wang, Sudhakar Yalamanchili: “Relational Algorithms for Multi-Bulk-Synchronous Processors “, NVIDIA Tech Report, June 2012. [WWW])
This report describes advantages of using GPUs for analytical queries. It compares performance of the Alenka database engine using a GPU with the performance of Oracle on a SPARC server. More information on Alenka including source code: https://github.com/antonmks/Alenka
In recent years, utilizing Graphics Processing Units for general processing has become a very popular approach to obtain low-cost high performance computing applications. Algorithms from many computer science application domains have been adapted to utilize GPUs to increase the efficiency of processing. Unfortunately, while other application domains strongly benefit from utilizing the GPUs, databases related applications seem not to get enough attention. The main goal of GPUs in Databases workshop is to fill this gap. This event is devoted to sharing the knowledge related to applying GPUs in Database environments and to discuss possible future development of this application domain.
The list of topics includes: data compression on GPU, GPUs in databases and data warehouses, data mining using GPUs, stream processing, applications of GPUs in bioinformatics and data oriented GPU primitives.
Read the rest of this entry »
Support for several types of compression has been added to the GPU-based database engine ålenkå . Supported algorithms include FOR (frame of reference), FOR-DELTA and dictionary compression. All compression algorithms run on the GPU achieving gigabytes per second compression and decompression speed. The use of compression allows to significantly reduce or eliminate I/O bottlenecks in analytical queries as shown by ålenkå’s results in the Star Schema and TPC-H benchmarks.
Alenka is a columnar SQL-like language for data processing on CUDA hardware. Alenka uses vector based processing to perform SQL operations like joins, groups and sorts. The program is capable of processing very large data sets that do not fit into GPU or host memory: such sets are partitioned into pieces and processed separately. Get it here: https://sourceforge.net/projects/alenka/files/
The deadline for submissions to “GPU’s in Databases” GID2011 workshop has been extended [ed: again...] to April 22nd, 2011. The “GPUs in Databases” workshop is devoted to sharing the knowledge related to applying GPUs in database environments and to discuss possible future development of this application domain. See our previous post for details.
The deadline for submissions to “GPU’s in Databases” GID2011 workshop has been extended to April 12th, 2011. The “GPUs in Databases” workshop is devoted to sharing the knowledge related to applying GPUs in database environments and to discuss possible future development of this application domain. The workshop topics include, but are not limited to: Read the rest of this entry »
The “GPUs in Databases” workshop is devoted to sharing the knowledge related to applying GPUs in database environments and to discuss possible future development of this application domain. The workshop topics include, but are not limited to:
- GPU based data compression (lossless/lossy compression and decompression, real time compression and decompression of multimedia)
- GPUs in databases and data warehouses (join processing, data indexing, data aggregation, bulk query processing, analytical query processing)
- Data mining using GPUs (classification, frequent itemsets and association rules, frequent subgraphs, sequential patterns, clustering, social networks mining, regression)
- GPUs in streaming databases (query processing in streaming databases, stream compression/decompression)
- Applications of GPUs in bioinformatics
The workshop will take place on September 19th, 2011 and is co-located with ADBIS 2011 in Vienna, Austria. Submissions are due April 5th, 2011. All of accepted submissions will be published in CEUR workshop proceedings and the best papers will also be published in Lecture Notes in Computer Science and Foundations of Computing and Decision Sciences.
More detailed information can be found at the workshop website http://gid2011.cs.put.poznan.pl.