The 1st International Workshop on OpenCL (IWOCL) will be held on May 13th/14th at Georgia Institute of Technology Atlanta, Georgia. IWOCL is an annual meeting of vendors, researchers and developers to promote the evolution and advancement of the OpenCL standard. The first workshop has an exciting full program, including a full day of tutorials, followed by a full day of keynotes, papers, and panels. More information can can be found here: http://iwocl.org.
We present an interface and an implementation of the General Matrix Multiply (GEMM) routine for multiple small matrices processed simultaneously on NVIDIA graphics processing units (GPUs). We focus on matrix sizes under 16. The implementation can be easily extended to larger sizes. For single precision matrices, our implementation is 30% to 600% faster than the batched cuBLAS implementation distributed in the CUDA Toolkit 5.0 on NVIDIA Tesla K20c. For example, we obtain 104 GFlop/s and 216 GFlop/s when multiplying 100,000 independent matrix pairs of size 10 and 16, respectively. Similar improvement in performance is obtained for other sizes, in single and double precision for real and complex types, and when the number of matrices is smaller. Apart from our implementation, our different function interface also plays an important role in the improved performance. Applications of this software include Finite Element computation on GPUs.
(Chetan Jhurani and Paul Mullowney, “A GEMM interface and implementation on NVIDIA GPUs for multiple small matrices”, submitted to Journal of Parallel and Distributed Computing, April 2013. [preprint])
Agent-Based Simulation Models are an increasingly popular tool for research and management in many fields such as ecology, economics and sociology. In some fields, such as social sciences, these models are seen as a key instrument to the generative approach, essential for understanding complex social phenomena. But also in policy-making, biology, military simulations, control of mobile robots and economics, the relevance and effectiveness of Agent-Based Simulation Models is recently recognized.
Several frameworks have been recently developed and are active in this field. They range from GPU-manycore approaches to parallel and/or distributed simulation environments.
The key objective of this workshop is to bring together researchers that are interested in getting more performances from their simulations by using:
- synchronized, many-core simulations (e.g., GPUs)
- strongly coupled, parallel simulations (e.g. MPI)
- loosely coupled, distributed simulations (distributed heterogeneous setting)
For details please visit http://www.padabs.org/
Northeastern University and Boston University, together with NVIDIA, are hosting a “GPUs Accelerating Research” Week next month.
On the first day, Wednesday 4/24, Northeastern is hosting a day of talks focused on how graphics processors are accelerating new and interesting areas of research in novel ways. The goal of this meeting is to provide a venue for both industry and academia to come together to discuss these innovations, and explore what lies ahead in GPU acceleration. Given that we have limited space in this one-day workshop, papers not selected for presentation at the workshop will have the option to present at a poster session to be held during the workshop. Please visit our website for registration and other details.
On the second day, Thursday 4/25, Boston University is hosting an all-day CUDA and OpenACC developer’s workshop. Prerequisites for getting the most out of this workshop are a basic understanding of C and the Linux command line. More details can be found here.
The use of graphic processor units (GPUs) has been recently proposed in computational electromagnetics to accelerate the solution of the electric field integral equation. In these methods, the linear systems obtained by using boundary elements are considered, and then an accelerated solution for a specific excitation is obtained. The existing studies are mostly focused on speeding up the filling time or the LU decomposition of that matrix. This limits the application to simple simulation scenarios if a fast method is not employed. In this paper, we propose a GPU acceleration for FFT-based integral equation solvers. We will investigate the operations involved in the solver, and we will motivate the use of GPUs. Results of numerical tests will be reported firstly on a perfect electric conductor sphere with different radii; then a realistic aircraft will be considered. We found that using GPUs for FFT-based methods allows achieving a reasonable speed-up.
(Elia A. Attardo1, Matteo A. Francavilla, Francesca Vipiana and Giuseppe Vecchi: “Investigation on Accelerating FFT-Based Methods for the EFIE on Graphics Processors”, International Journal of Numerical Modelling: Electronic Networks, Devices and Fields, to appear, Nov. 2012. [DOI])
The GPU Debayer software developed by Fastvideo can be used for demosaicing of raw 8-bit Bayer images to full-color 24-bit RGB format. The application employs the HQLI and DFPD algorithms and is tuned for NVIDIA GPUs, which results in very fast conversion, e.g., only 1.25 ms for Full HD image demosaicing on GeForce GTX 580. The software is freely available.
Due to ever increasing demand for fast processing of large analytical workloads, main memory column-oriented databases have attracted a lot of attention in recent years. In-memory databases eliminate the disk I/O barrier by storing the data in memory. In addition, they utilize a column-oriented data layout to offer a multi-core-friendly and memory-bandwidth-efficient processing scheme. On the other hand, recently, graphics processing units (GPUs) have emerged as powerful tools for general high-performance computing. GPUs are affordable and energy-efficient devices that deliver a massive computational power by utilizing a large number of cores and a high memory bandwidth. GPUs can be used as co-processors for query acceleration of in-memory databases. One of the main bottlenecks in GPU-acceleration of in-memory databases is the need for data to be transferred back and forward between GPU memory and RAM through a low-bandwidth PCIe bus. To address this problem, in this study, a new generation of in-memory databases is proposed that instead of keeping data in main memory stores it in GPU device memory.
(Pedram Ghodsnia: “An In-GPU-Memory Column-Oriented Database for Processing Analytical Workloads”, VLDB 2012 PhD Workshop, Istanbul, Turkey, August 2012. [PDF])
Recently, general-purpose computing on graphics processing units (GPGPU) has been enabled on mobile devices thanks to the emerging heterogeneous programming models such as OpenCL. The capability of GPGPU on mobile devices opens a new era for mobile computing and can enable many computationally demanding computer vision algorithms on mobile devices. As a case study, this paper proposes to accelerate an exemplar-based inpainting algorithm for object removal on a mobile GPU using OpenCL. We discuss the methodology of exploring the parallelism in the algorithm as well as several optimization techniques. Experimental results demonstrate that our optimization strategies for mobile GPUs have significantly reduced the processing time and make computationally intensive computer vision algorithms feasible for a mobile device. To the best of the authors’ knowledge, this work is the first published implementation of general-purpose computing using OpenCL on mobile GPUs.
(Guohui Wang, Yingen Xiong, Jay Yun and Joseph R. Cavallaro: “Accelerating Computer Vision Algorithms Using OpenCL on the Mobile GPU – A Case Study”, International Conference on Acoustics, Speech, and Signal Processing (ICASSP)}, May 2013, to appear. [PDF])
As the word “UnConventional” in the title suggests, the workshop focuses on hardware or platforms used for HPC, which were not intended for HPC in the first place. Reasons could be raw computing power, good performance per watt, or low cost in general. To address this unconventional hardware, often, new programming approaches and paradigms are required to make best use of it. A second focus of the workshop is on innovative, (yet) unconventional new programming models. To this end, UCHPC tries to capture solutions for HPC which are unconventional today but could become conventional and significant tomorrow, and thus provide a glimpse into the eventual future of HPC. The goal of the workshop is to present latest research in how hardware and software (yet) unconventional for HPC is or can be used to reach goals such as best performance per watt. UCHPC also covers according programming models, compiler techniques, and tools.
UCHPC is held in conjunction with Euro-Par 2013, August 26 – August 30, Aachen, Germany. More information,including the full call for papers, submission instructions and important dates: uchpc13.cs.tum.edu
The following new webinars about NVIDIA Tesla K20 have been announced. During these live webinars, developers will be able to get answers directly from the presenters.