A Multi-GPU Programming Library for Real-Time Applications

January 11th, 2013


We present MGPU, a C++ programming library targeted at single-node multi-GPU systems. Such systems combine disproportionate floating point performance with high data locality and are thus well suited to implement real-time algorithms. We describe the library design, programming interface and implementation details in light of this specific problem domain. The core concepts of this work are a novel kind of container abstraction and MPI-like communication methods for intra-system communication. We further demonstrate how MGPU is used as a framework for porting existing GPU libraries to multi-device architectures. Putting our library to the test, we accelerate an iterative non-linear image reconstruction algorithm for real-time magnetic resonance imaging using multiple GPUs. We achieve a speed-up of about 1.7 using 2 GPUs and reach a final speed-up of 2.1 with 4 GPUs. These promising results lead us to conclude that multi-GPU systems are a viable solution for real-time MRI reconstruction as well as signal-processing applications in general.

(Sebastian Schaetz and Martin Uecker: “A Multi-GPU Programming Library for Real-Time Applications”,  Algorithms and Architectures for Parallel Processing (2012): 114-128. [DOI] [ARXIV])

Processing data streams with hard real-time constraints on heterogeneous systems

March 29th, 2011


Data stream processing applications such as stock exchange data analysis, VoIP streaming, and sensor data processing pose two conflicting challenges: short per-stream latency — to satisfy the milliseconds-long, hard real-time constraints of each stream, and high throughput — to enable efficient processing of as many streams as possible. High-throughput programmable accelerators such as modern GPUs hold high potential to speed up the computations. However, their use for hard real-time stream processing is complicated by slow communications with CPUs, variable throughput changing non-linearly with the input size, and weak consistency of their local memory with respect to CPU accesses. Furthermore, their coarse grain hardware scheduler renders them unsuitable for unbalanced multi-stream workloads.

We present a general, efficient and practical algorithm for hard real-time stream scheduling in heterogeneous systems. The algorithm assigns incoming streams of different rates and deadlines to CPUs and accelerators. By employing novel stream schedulability criteria for accelerators, the algorithm finds the assignment which simultaneously satisfies the aggregate throughput requirements of all the streams and the deadline constraint of each stream alone.

Using the AES-CBC encryption kernel, we experimented extensively on thousands of streams with realistic rate and deadline distributions. Our framework outperformed the alternative methods by allowing 50% more streams to be processed with provably deadline-compliant execution even for deadlines as short as tens milliseconds. Overall, the combined GPU-CPU execution allows for up to 4-fold throughput increase over highly-optimized multi-threaded CPU-only implementations.

( Uri Verner, Assaf Schuster and Mark Silberstein, “Processing data streams with hard real-time constraints on heterogeneous systems”, ICS’11, to appear)

Read the rest of this entry »

GPU Implementation of Extended Gaussian Mixture Model for Background Subtraction

January 12th, 2011


Although trivial background subtraction (BGS) algorithms (e.g. frame differencing, running average…) can perform quite fast, they are not robust enough to be used in various computer vision problems. Some complex algorithms usually give better results, but are too slow to be applied to real-time systems. We propose an improved version of the Extended Gaussian mixture model that utilizes the computational power of Graphics Processing Units (GPUs) to achieve real-time performance. Experiments show that our implementation running on a low-end GeForce 9600GT GPU provides at least 10x speedup. The frame rate is greater than 50 frames per second (fps) for most of the tests, even on HD video formats.

(Vu Pham, Phong Vo, Vu Thanh Hung and Le Hoai Bac: “GPU Implementation of Extended Gaussian Mixture Model for Background Subtraction”. IEEE International Conference on Computing and Communication Technologies, Research, Innovation, and Vision for the Future (RIVF), 2010. [DOI] [code and additional information])