“GPU Algorithms for Image Processing and Computer Vision”, to be published by Springer, will contain a collection of articles on fundamental image processing and computer vision methods adapted for Graphics Processing Units (GPUs). In recent years, substantial efforts were undertaken to adapt many such algorithms for massively-parallel GPU-based systems. The book is envisioned as a consolidation of such work into a single volume covering widely used methods and techniques. Each chapter will be written by authors working on a specific group of methods. It will provide mathematical background, parallel algorithm, and implementation details leading to reusable, adaptable, and scalable code fragments. The book will serve as a GPU implementation manual for many image processing and analysis algorithms providing valuable insights into parallelization strategies for GPUs as well as ready-to-use code fragments with a broad appeal to both developers and researchers interested in GPU computing. Read the rest of this entry »
A new book titled “Numerical Computations with GPUs” has been published:
This book brings together research on numerical methods adapted for Graphics Processing Units (GPUs). It explains recent efforts to adapt classic numerical methods, including solution of linear equations and FFT, for massively parallel GPU architectures. This volume consolidates recent research and adaptations, covering widely used methods that are at the core of many scientific and engineering computations. Each chapter is written by authors working on a specific group of methods; these leading experts provide mathematical background, parallel algorithms and implementation details leading to reusable, adaptable and scalable code fragments. This book also serves as a GPU implementation manual for many numerical algorithms, sharing tips on GPUs that can increase application efficiency. The valuable insights into parallelization strategies for GPUs are supplemented by ready-to-use code fragments. Numerical Computations with GPUs targets professionals and researchers working in high performance computing and GPU programming. Advanced-level students focused on computer science and mathematics will also find this book useful as secondary text book or reference.
From the table of contents: Read the rest of this entry »
Partnering with NVIDIA, this four day CUDA training course, held in Houston is designed for programmers in the oil and gas industry who are looking to develop comprehensive skills in writing and optimizing applications that fully leverage the many-core processing capabilities of the GPU. Commonly used algorithms such as filtering and FFTs will be used and profiled in the examples. The case study on day 4 focuses on efficient implementation of a finite difference algorithm which is highly applicable to reverse time migration. However a background in oil and gas is not necessary. For more information and to view a copy of the course outline please visit: http://acceleware.com/training/987
Boost.Compute is a header-only C++ library for GPGPU and parallel-computing based on OpenCL. It provides a low-level C++ wrapper over OpenCL and high-level STL-like API with containers and algorithms for the GPU. It is available on GitHub and instructions for getting started can be found in the documentation. See the full announcement here: http://kylelutz.blogspot.com/2014/07/boost-compute-v0.3-released.html
A new version of the rCUDA middleware has been released (version 4.2). In addition to fix some minor bugs, the new release provides support for:
- CUDA 6.0 Runtime API
- New stream management
- cuSPARSE libraries
The rCUDA middleware allows to seamlessly use, within your cluster, GPUs that are installed in computing nodes different from the one that is executing the CUDA application, without requiring to modify your program. Please visit www.rcuda.net for more details about the rCUDA technology.
Many current high-performance clusters include one or more GPUs per node in order to dramatically reduce application execution time, but the utilization of these accelerators is usually far below 100%. In this context, emote GPU virtualization can help to reduce acquisition costs as well as the overall energy consumption. In this paper, we investigate the potential overhead and bottlenecks of several “heterogeneous” scenarios consisting of client GPU-less nodes running CUDA applications and remote GPU-equipped server nodes providing access to NVIDIA hardware accelerators. The experimental evaluation is performed using three general-purpose multicore processors (Intel Xeon, Intel Atom and ARM Cortex A9), two graphics accelerators (NVIDIA GeForce GTX480 and NVIDIA Quadro M1000), and two relevant scientific applications (CUDASW++ and LAMMPS) arising in bioinformatics and molecular dynamics simulations.
(A. Castelló, J. Duato, R. Mayo, A. J. Peña, E. S. Quintana-Ortí, V. Roca, and F. Silla, “On the Use of Remote GPUs and Low-Power Processors for the Acceleration of Scientific Applications”. Fourth International Conference on Smart Grids, Green Communications and IT Energy-aware Technologies, ENERGY 2014, Chamonix (France), pp. 57–62, 20 – 24 April 2014. [PDF])
We present a cache-aware method for accelerating texture-based volume rendering on a graphics processing unit (GPU). Because a GPU has hierarchical architecture in terms of processing and memory units, cache optimization is important to maximize performance for memory-intensive applications. Our method localizes texture memory reference according to the location of the viewpoint and dynamically selects the width and height of thread blocks (TBs) so that each warp, which is a series of 32 threads processed simultaneously, can minimize memory access strides. We also incorporate transposed indexing of threads to perform TB-level cache optimization for specific viewpoints. Furthermore, we maximize TB size to exploit spatial locality with fewer resident TBs. For viewpoints with relatively large strides, we synchronize threads of the same TB at regular intervals to realize synchronous ray propagation. Experimental results indicate that our cache-aware method doubles the worst rendering performance compared to those provided by the CUDA and OpenCL software development kits.
(Yuki Sugimoto, Fumihiko Ino, and Kenichi Hagihara: “Improving Cache Locality for GPU-based Volume Rendering”. Parallel Computing 40(5/6): 59-69, May 2014. [DOI])
Webinar: Next Steps for Folding@Home — a Distributed Computing Project for Protein Folding, by Vijay PandeJune 3rd, 2014
Folding@Home is a large-scale volunteer distributed computing project started in 2000 by Vijay Pande, Stanford. For over a decade, Professor Pande’s group has increased the computing power of Folding@Home through the development of new software algorithms and infrastructure, such as the incorporation of new hardware innovations like GPUs. That tremendous computing power has enabled significant advances in the simulation and understanding of diseases like Alzheimer’s Disease, malaria, various cancers, and other diseases at the molecular scale. Professor Pande will give a brief introduction to Folding@Home and the successes in the project so far. He will also discuss plans to greatly enhance Folding@Home capabilities through new initiatives. This webinar is planned for June 3rd, 2014 at 9.00 AM Pacific Time. Register at: http://bit.ly/FolHome
The conference focuses on the application of GPUs in High Energy Physics (HEP), expanding on the trend of previous workshops on the topic and pointing to establishing a recurrent series. The emerging paradigm of the use of graphic processors as powerful accelerators in data- and computation-intensive applications found fertile ground in the computing challenges of the HEP community and is currently object of active investigations. This follows a long established trend which sees the increased use of cheap off-the-shelf commercial units to achieve unprecedented performances in parallel data processing, thus leveraging on a very strong commitment of hardware producers to the huge market of computer graphics and games. These hardware advances comes together with the continuous development of proprietary and free software to expose the raw computing power of GPUs for general-purpose applications and scientific computing in particular. All different applications of massively parallel computing in HEP will be addressed, from computational speed-ups in online and offline data selection and analysis to hard real-time applications in low-level triggering, to MonteCarlo simulations for lattice QCD. Both current activities and plans for foreseen experiments and projects will be discussed, together with perspectives on the evolution of the hardware and software.
The conference is held in Pisa (Italy), 10.9.2014 – 12.9.2014. More information: http://www.pi.infn.it/gpu2014
Analysis of functional magnetic resonance imaging (fMRI) data is becoming ever more computationally demanding as temporal and spatial resolutions improve, and large, publicly available data sets proliferate. Moreover, methodological improvements in the neuroimaging pipeline, such as non-linear spatial normalization, non-parametric permutation tests and Bayesian Markov Chain Monte Carlo approaches, can dramatically increase the computational burden. Despite these challenges, there do not yet exist any fMRI software packages which leverage inexpensive and powerful GPUs to perform these analyses. Here, we therefore present BROCCOLI, a free software package written in OpenCL that can be used for parallel analysis of fMRI data on a large variety of hardware configurations. BROCCOLI has, for example, been tested with an Intel CPU, an Nvidia GPU, and an AMD GPU. These tests show that parallel processing of fMRI data can lead to significantly faster analysis pipelines. This speedup can be achieved on relatively standard hardware, but further speed improvements require only a modest investment in GPU hardware. BROCCOLI (running on a GPU) can perform non-linear spatial normalization to a 1 mm3 brain template in 4–6 s, and run a second level permutation test with 10,000 permutations in about a minute. These non-parametric tests are generally more robust than their parametric counterparts, and can also enable more sophisticated analyses by estimating complicated null distributions. Additionally, BROCCOLI includes support for Bayesian first-level fMRI analysis using a Gibbs sampler. The new software is freely available under GNU GPL3 and can be downloaded from github: https://github.com/wanderine/BROCCOLI.
(A. Eklund, P. Dufort, M. Villani and S. LaConte: “BROCCOLI: Software for fast fMRI analysis on many-core CPUs and GPUs”. Front. Neuroinform. 8:24, 2014. [DOI])