Hot-Rodding Windows and Linux App Performance with CUDA-Based Plugins

February 28th, 2012

This Dr. Dobb’s Article by Rob Farber provides a tutorial on creating application plugins to accelerate Windows and Linux application performance using CUDA in dynamically loaded libraries.

Adding GPU capabilities to existing Windows and Linux apps can be done simply using plugins and the built-in support found in CUDA. This easy form of dynamic loading enables CUDA to be used selectively to hugely accelerate individual tasks within a larger application.

CUDA is maturing to become a natural extension of the emerging CPU/GPU paradigm of high-speed computing to make it, and GPU computing, a candidate for all application development. A recent article in this series tutorial series, Running CUDA Code Natively on x86 Processors, noted recent developments that allow CUDA programs to transparently compile and run on x86 processors. This article focuses on incorporating CUDA into Windows and Linux workflows by exploiting the capabilities of the NVIDIA compiler driver, nvcc, to create native runtime loadable plugins. Source code is provided to create and utilize CUDA plugins and even dynamically compile and link a CUDA source file into a running application (just like the OpenCL). This tutorial also provides a general “click together tools” framework that can stream arbitrary messages (vectors, arrays, and complex nested structures) among heterogeneous CPU-, GPU- and CPU+GPU-based applications running within a single workstation, across a network of machines, or within a cloud computing framework. My production version of this same framework has successfully integrated multiple supercomputers and numerous computation nodes into a single unified workflow.