From a recent product announcement:
DeepCloud Whirlwind is an analytics only SQL database using modern GPUs for accelerated SQL processing. We see over 700x performance increase over a “well known” database on the same machine. Features include:
- column based storage
- vector processing
- SSD optimized
- smart compression – Ultra fast compression and decompression on the GPU
- MySQL like API – works with many MySQL client tools
- Oracle subset dialect
- data skipping
- zone maps
- fast schema-light data loading
Use Whirlwind database technology to get maximum database performance from significantly cheaper hardware or go all out with a state of the art system built from modern components. Beta avalable now under the GPL at: http://deepcloud.co
High performance of modern Graphics Processing Units may be utilized not only for graphics related application but also for general computing. This computing power has been utilized in new variants of many algorithms from almost every computer science domain. Unfortunately, while other application domains strongly benefit from utilizing the GPUs, databases related applications seem not to get enough attention. The main goal of GPUs in Databases workshop is to fill this gap. This event is devoted to sharing the knowledge related to applying GPUs in Database environments and to discuss possible future development of this application domain.
ADBIS workshop on GPUs In Databases GID 2014, September 7th, 2014, Ohrid, Republic of Macedonia. More information: http://gid.us.to
Due to ever increasing demand for fast processing of large analytical workloads, main memory column-oriented databases have attracted a lot of attention in recent years. In-memory databases eliminate the disk I/O barrier by storing the data in memory. In addition, they utilize a column-oriented data layout to offer a multi-core-friendly and memory-bandwidth-efficient processing scheme. On the other hand, recently, graphics processing units (GPUs) have emerged as powerful tools for general high-performance computing. GPUs are affordable and energy-efficient devices that deliver a massive computational power by utilizing a large number of cores and a high memory bandwidth. GPUs can be used as co-processors for query acceleration of in-memory databases. One of the main bottlenecks in GPU-acceleration of in-memory databases is the need for data to be transferred back and forward between GPU memory and RAM through a low-bandwidth PCIe bus. To address this problem, in this study, a new generation of in-memory databases is proposed that instead of keeping data in main memory stores it in GPU device memory.
(Pedram Ghodsnia: “An In-GPU-Memory Column-Oriented Database for Processing Analytical Workloads”, VLDB 2012 PhD Workshop, Istanbul, Turkey, August 2012. [PDF])
High performance of modern Graphics Processing Units may be utilized not only for graphics related application but also for general computing. This computing power has been utilized in new variants of many algorithms from almost every computer science domain. Unfortunately, while other application domains strongly benefit from utilizing the GPUs, databases related applications seem not to get enough attention. The main goal of the GPUs in Databases (GID) workshop is to fill this gap. This event is devoted to sharing the knowledge related to applying GPUs in Database environments and to discuss possible future development of this application domain. The list of topics of the GID workshop includes (but is not limited to):
- Data compression on GPUs
- GPUs in databases and data warehouses
- Data mining using GPUs
- Stream processing
- Applications of GPUs in bioinformatics
- Data oriented GPU primitives
For details please visit gid.us.to.
This publication describes efficient low level algorithms for performing relational queries on parallel processors, such as NVIDIA Fermi or Kepler. Relations are stored in GPU memory as sorted arrays of tuples, and manipulated by relational operators that preserve the sorted property. Most significantly, this work introduces algorithms for JOIN and SET INTERSECTION/UNION/DIFFERENCE that can process data at over 50 GB/s.
Relational databases remain an important application domain for organizing and analyzing the massive volume of data generated as sensor technology, retail and inventory transactions, social media, computer vision, and new fields continue to evolve. At the same time, processor architectures are beginning to shift towards hierarchical and parallel architectures employing throughput-optimized memory systems, lightweight multi-threading, and Single-Instruction Multiple-Data (SIMD) core organizations. Examples include general purpose graphics processing units (GPUs) such as NVIDIA’s Fermi, Intels Sandy Bridge, and AMD’s Fusion processors. This paper explores the mapping of primitive relational algebra operations onto GPUs. In particular, we focus on algorithms and data structure design identifying a fundamental conflict between the structure of algorithms with good computational complexity and that of algorithms with memory access patterns and instruction schedules that achieve peak machine utilization. To reconcile this conflict, our design space exploration converges on a hybrid multi-stage algorithm that devotes a small amount of the total runtime to prune input data sets using an irregular algorithm with good computational complexity. The partial results are then fed into a regular algorithm that achieves near peak machine utilization. The design process leading to the most efficient algorithm for each stage is described, detailing alternative implementations, their performance characteristics, and an explanation of why they were ultimately abandoned. The least efficient algorithm (JOIN) achieves 57% − 72% of peak machine performance depending on the density of the input. The most efficient algorithms (PRODUCT, PROJECT, and SELECT) achieve 86% − 92% of peak machine performance across all input data sets. To the best of our knowledge, these represent the best known published results to date for any implementations. This work lays the foundation for the development of a relational database system that achieves good scalability on a Multi-Bulk-Synchronous-Parallel (M-BSP) processor architecture. Additionally, the algorithm design may offer insights into the design of parallel and distributed relational database systems. It leaves the problems of query planning, operator→query synthesis, corner case optimization, and system/OS interaction as future work that would be necessary for commercial deployment.
(Gregory Diamos, Ashwin Lele, Jin Wang, Sudhakar Yalamanchili: “Relational Algorithms for Multi-Bulk-Synchronous Processors “, NVIDIA Tech Report, June 2012. [WWW])
This report describes advantages of using GPUs for analytical queries. It compares performance of the Alenka database engine using a GPU with the performance of Oracle on a SPARC server. More information on Alenka including source code: https://github.com/antonmks/Alenka
In recent years, utilizing Graphics Processing Units for general processing has become a very popular approach to obtain low-cost high performance computing applications. Algorithms from many computer science application domains have been adapted to utilize GPUs to increase the efficiency of processing. Unfortunately, while other application domains strongly benefit from utilizing the GPUs, databases related applications seem not to get enough attention. The main goal of GPUs in Databases workshop is to fill this gap. This event is devoted to sharing the knowledge related to applying GPUs in Database environments and to discuss possible future development of this application domain.
The list of topics includes: data compression on GPU, GPUs in databases and data warehouses, data mining using GPUs, stream processing, applications of GPUs in bioinformatics and data oriented GPU primitives.
Read the rest of this entry »
Support for several types of compression has been added to the GPU-based database engine ålenkå . Supported algorithms include FOR (frame of reference), FOR-DELTA and dictionary compression. All compression algorithms run on the GPU achieving gigabytes per second compression and decompression speed. The use of compression allows to significantly reduce or eliminate I/O bottlenecks in analytical queries as shown by ålenkå’s results in the Star Schema and TPC-H benchmarks.
Alenka is a columnar SQL-like language for data processing on CUDA hardware. Alenka uses vector based processing to perform SQL operations like joins, groups and sorts. The program is capable of processing very large data sets that do not fit into GPU or host memory: such sets are partitioned into pieces and processed separately. Get it here: https://sourceforge.net/projects/alenka/files/
The deadline for submissions to “GPU’s in Databases” GID2011 workshop has been extended [ed: again...] to April 22nd, 2011. The “GPUs in Databases” workshop is devoted to sharing the knowledge related to applying GPUs in database environments and to discuss possible future development of this application domain. See our previous post for details.