Proceedings from the 2nd International Workshop on High Performance and Hardware-Aware Computing (HIPHAC 2011) are now available from KIT Scientific Publishing. Individual copies can be ordered here, and the electronic proceedings are available free of charge.
FGC 2011 – The Second International Workshop on Frontier of GPU Computing, is held in conjunction with CSE 2011, Dalian, China, 24 – 26 August, 2011. More information can be found at http://www.comp.hkbu.edu.hk/~chxw/fgc2011/index.php.
The First International Workshop on Characterizing Applications for Heterogeneous Exascale Systems (co-located with ICS, June 4, 2011) is intended to provide evaluations of the characteristics of computational kernels and applications, and how different software stacks impact them, to guide future accelerator-based HPC system designs.
We solicit papers on all aspects of HPC application studies, especially those that involve accelerators such as GPUs, FPGAs, etc. The topics include (but are not limited to):
- Categorizing/characterizing of HPC applications and kernels with respect to patterns in computation structure, communication, cache accesses, memory, I/O, and file accesses.
- Evaluating the importance of individual kernels within an entire application.
- Modeling for applications running on accelerator-based heterogeneous HPC systems.
- Implication of workload characterization in heterogeneous design issues.
- Benchmarking of applications, kernels or software stacks and tools supporting applications.
The call for papers and more details about the workshop may be found on the website.
GPU Acceleration of Multilevel Solvers for Analysis of Microwave Components With Finite Element MethodFebruary 13th, 2011
The paper discusses a fast implementation of the conjugate gradient iterative method with E-field multilevel preconditioner applied to solving real symmetric and sparse systems obtained with vector finite element method. In order to accelerate computations, a graphics processing unit (GPU) was used and significant speed-up (2.61 fold) was achieved comparing to a central processing unit (CPU) based approach. These results indicate that performance of electromagnetic simulations can be significantly improved thereby enabling full wave optimization of microwave components in more manageable time.
(A. Dziekonski, A. Lamecki and M. Mrozowski: “GPU Acceleration of Multilevel Solvers for Analysis of Microwave Components With Finite Element Method”, IEEE Microwave and Wireless Components Letters 21(1) pp.1-3, Jan. 2011. [DOI])
Proteins, nucleic acids, and small molecules form a dense network of molecular interactions in a cell. The architecture of molecular networks can reveal important principles of cellular organization and function, similarly to the way that protein structure tells us about the function and organization of a protein. Protein complexes are groups of proteins that interact with each other at the same time and place, forming a single multimolecular machine. Functional modules, in contrast, consist of proteins that participate in a particular cellular process while binding each other at a different time and place.
A protein-protein interaction network is represented as proteins are nodes and interactions between proteins are edges. Protein complexes and functional modules can be identified as highly interconnected subgraphs and computational methods are now inevitable to detect them from protein interaction data. In addition, High-throughput screening techniques such as yeast two-hybrid screening enable identification of detailed protein-protein interactions map in multiple species. As the interaction dataset increases, the scale of interconnected protein networks increases exponentially so that the increasing complexity of network gives computational challenges to analyze the networks. Read the rest of this entry »
GMAC is a user-level library that implements an Asymmetric Distributed Shared Memory model to be used by CUDA programs. An ADSM model builds a global memory space that allows CPU code to transparently access data hosted in accelerators’ (GPUs’) memories. Moreover, the coherency of the data is automatically handled by the library. This removes the necessity for manual memory transfers (cudaMemcpy) between the host and GPU memories. Furthermore, GMAC assigns a different “virtual GPU” to each host thread, and the virtual GPUs are evenly mapped to physical GPUs. This is especially useful for multi-GPU programs since each host thread can access the memory of all GPUs and simple GPU-to-GPU transfers can be performed with simple memcpy calls. Read the rest of this entry »
We examine the problem of segmenting foreground objects in live video when background scene textures change over time. In particular, we formulate background subtraction as minimizing a penalized instantaneous risk functional yielding a local on-line discriminative algorithm that can quickly adapt to temporal changes. We analyze the algorithms convergence, discuss its robustness to non-stationarity, and provide an efficient non-linear extension via sparse kernels. To accommodate interactions among neighboring pixels, a global algorithm is then derived that explicitly distinguishes objects versus background using maximum a posteriori inference in a Markov random field (implemented via graph-cuts). By exploiting the parallel nature of the proposed algorithms, we develop an implementation that can run efficiently on the highly parallel Graphics Processing Unit (GPU). Empirical studies on a wide variety of datasets demonstrate that the proposed approach achieves quality that is comparable to state-of-the-art off-line methods, while still being suitable for real-time video analysis (75 fps on a mid-range GPU).
The Parallel Processing for Imaging Applications conference, part of IS&T/SPIE’s Electronic Imaging conference, was held on January 24–25 in San Francisco. The conference had a large number of GPU papers (SPIE digital library link):
- Using a commercial graphical processing unit and the CUDA programming language to accelerate scientific image processing applications by Broussard and Ives
- GPGPU real-time texture analysis framework by Akhloufi et al.
- A parallel implementation of 3D Zernike moment analysis by Berjón et al.
- Visualization assisted by parallel processing by Lange et al.
- GPU color space conversion by Chase and Vondran
- Acceleration of the Retinex algorithm for image restoration by GPGPU/CUDA by Wang and Huang
- Video transcoding using GPU accelerated decoder by Hsu
- Real-time image deconvolution on the GPU by Klosowski and Krishnan
- GPU-completeness: theory and implications by Lin
- A parallel error diffusion implementation on a GPU by Zhang et al.
- Evaluation of CPU and GPU architectures for spectral image analysis algorithms by Fresse et al.
- Real-time 3D flash ladar imaging through GPU data processing by Wong et al.
- Advanced MRI reconstruction toolbox with accelerating on GPU by Wu et al.
- Accelerating image recognition on mobile devices using GPGPU by López et al.
- A GPU accelerated PDF transparency engine by Recker et al.
We implemented a GPU based parallel code to perform Monte Carlo simulations of the two dimensional q-state Potts model. The algorithm is based on a checkerboard update scheme and assigns independent random number generators to each thread (one thread per spin). The implementation allows to simulate systems up to ~10^9 spins with an average time per spin flip of 0.147ns on the fastest GPU card tested, representing a speedup up to 155x, compared with an optimized serial code running on a standard CPU. The possibility of performing high speed simulations at large enough system sizes allowed us to provide a positive numerical evidence about the existence of metastability on very large systems based on Binder’s criterion, namely, on the existence or not of specific heat singularities at spinodal temperatures different of the transition one.
(Ezequiel E. Ferrero, Juan Pablo De Francesco, Nicolás Wolovick and Sergio A. Cannas: “q-state Potts model metastability study using optimized GPU-based Monte Carlo algorithms”. [arXiv:1101.0876] [code and additional information])
Although trivial background subtraction (BGS) algorithms (e.g. frame differencing, running average…) can perform quite fast, they are not robust enough to be used in various computer vision problems. Some complex algorithms usually give better results, but are too slow to be applied to real-time systems. We propose an improved version of the Extended Gaussian mixture model that utilizes the computational power of Graphics Processing Units (GPUs) to achieve real-time performance. Experiments show that our implementation running on a low-end GeForce 9600GT GPU provides at least 10x speedup. The frame rate is greater than 50 frames per second (fps) for most of the tests, even on HD video formats.
(Vu Pham, Phong Vo, Vu Thanh Hung and Le Hoai Bac: “GPU Implementation of Extended Gaussian Mixture Model for Background Subtraction”. IEEE International Conference on Computing and Communication Technologies, Research, Innovation, and Vision for the Future (RIVF), 2010. [DOI] [code and additional information])