Folding@Home is a large-scale volunteer distributed computing project started in 2000 by Vijay Pande, Stanford. For over a decade, Professor Pande’s group has increased the computing power of Folding@Home through the development of new software algorithms and infrastructure, such as the incorporation of new hardware innovations like GPUs. That tremendous computing power has enabled significant advances in the simulation and understanding of diseases like Alzheimer’s Disease, malaria, various cancers, and other diseases at the molecular scale. Professor Pande will give a brief introduction to Folding@Home and the successes in the project so far. He will also discuss plans to greatly enhance Folding@Home capabilities through new initiatives. This webinar is planned for June 3rd, 2014 at 9.00 AM Pacific Time. Register at: http://bit.ly/FolHome

## Webinar: Next Steps for Folding@Home — a Distributed Computing Project for Protein Folding, by Vijay Pande

June 3rd, 2014## CfP: GPU in High Energy Physics 2014

June 3rd, 2014The conference focuses on the application of GPUs in High Energy Physics (HEP), expanding on the trend of previous workshops on the topic and pointing to establishing a recurrent series. The emerging paradigm of the use of graphic processors as powerful accelerators in data- and computation-intensive applications found fertile ground in the computing challenges of the HEP community and is currently object of active investigations. This follows a long established trend which sees the increased use of cheap off-the-shelf commercial units to achieve unprecedented performances in parallel data processing, thus leveraging on a very strong commitment of hardware producers to the huge market of computer graphics and games. These hardware advances comes together with the continuous development of proprietary and free software to expose the raw computing power of GPUs for general-purpose applications and scientific computing in particular. All different applications of massively parallel computing in HEP will be addressed, from computational speed-ups in online and offline data selection and analysis to hard real-time applications in low-level triggering, to MonteCarlo simulations for lattice QCD. Both current activities and plans for foreseen experiments and projects will be discussed, together with perspectives on the evolution of the hardware and software.

The conference is held in Pisa (Italy), 10.9.2014 – 12.9.2014. More information: http://www.pi.infn.it/gpu2014

## BROCCOLI: Software for fast fMRI analysis on many-core CPUs and GPUs

May 27th, 2014Abstract:

Analysis of functional magnetic resonance imaging (fMRI) data is becoming ever more computationally demanding as temporal and spatial resolutions improve, and large, publicly available data sets proliferate. Moreover, methodological improvements in the neuroimaging pipeline, such as non-linear spatial normalization, non-parametric permutation tests and Bayesian Markov Chain Monte Carlo approaches, can dramatically increase the computational burden. Despite these challenges, there do not yet exist any fMRI software packages which leverage inexpensive and powerful GPUs to perform these analyses. Here, we therefore present BROCCOLI, a free software package written in OpenCL that can be used for parallel analysis of fMRI data on a large variety of hardware configurations. BROCCOLI has, for example, been tested with an Intel CPU, an Nvidia GPU, and an AMD GPU. These tests show that parallel processing of fMRI data can lead to significantly faster analysis pipelines. This speedup can be achieved on relatively standard hardware, but further speed improvements require only a modest investment in GPU hardware. BROCCOLI (running on a GPU) can perform non-linear spatial normalization to a 1 mm3 brain template in 4–6 s, and run a second level permutation test with 10,000 permutations in about a minute. These non-parametric tests are generally more robust than their parametric counterparts, and can also enable more sophisticated analyses by estimating complicated null distributions. Additionally, BROCCOLI includes support for Bayesian first-level fMRI analysis using a Gibbs sampler. The new software is freely available under GNU GPL3 and can be downloaded from github: https://github.com/wanderine/BROCCOLI.

(A. Eklund, P. Dufort, M. Villani and S. LaConte: *“BROCCOLI: Software for fast fMRI analysis on many-core CPUs and GPUs”*. Front. Neuroinform. 8:24, 2014. [DOI])

## Master’s thesis: Parallel Computing for Digital Signal Processing on Mobile Device GPUs

May 27th, 2014This master’s thesis by Markus Konrad analyzes the potentials of GPGPU on mobile devices such as smartphones or tablets. The question was, if and how the GPU on such devices can be used to speed up certain algorithms especially in the fields of image processing. GPU computing technologies such as OpenCL, OpenGL shaders, and Android RenderScript are assessed in the thesis. The abstract reads as follows:

This thesis studies how certain popular algorithms in the field of image and audio processing can be accelerated on mobile devices by means of parallel execution on their graphics processing unit (GPU). Several technologies with which this can be achieved are compared in terms of possible performance improvements, hardware and software support, as well as limitations of their programming model and functionality. The results of this research are applied in a practical project, consisting of performance improvements for marker detection in an Augmented Reality application for mobile devices.

The PDF is available for download and the source-code for some Android application prototypes is published on github.

## PARALUTION 0.7.0 released

May 27th, 2014PARALUTION is a library for sparse iterative methods which can be performed on various parallel devices, including multi-core CPU, GPU (CUDA and OpenCL) and Intel Xeon Phi. The new 0.7.0 version provides the following new features:

- Windows support – full windows support for all backends (CUDA, OpenCL, OpenMP)
- Assembling function – new OpenMP parallel assembling function for sparse matrices (includes an update function for time-dependent problems)
- Direct (dense) solvers (for very small problems)
- (Restricted) Additive Schwarz preconditioners
- MATLAB/Octave plug-in

To avoid OpenMP overhead for small sized problems, the library will compute in serial if the size of the matrix/vector is below a pre-defined threshold. Internally, the OpenCL backend has been modified for simplified cross platform compilation.

## Webinar, May 20th: Accelerating GIS Big Data Processing

May 18th, 2014Join the free webinar on May 20th devoted to accelerating orthorectification, atmospheric correction, and transformations for big data with GPUs. Learn how GPU capabilities can improve time for processing large imagery 50-100 times faster. Amanda O’Connor, a Senior Solutions Engineer at Exelis will walk you through implementation of GPU processing for large imagery datasets, operational use of GPU processing for orthorectification and share benchmarks against desktop algorithms. To register follow this link: https://www2.gotomeeting.com/register/665929994.

## Boost.Compute v0.2 Released

May 15th, 2014Boost.Compute v0.2 has been released! Boost.Compute is a header-only C++ library for GPGPU and parallel-computing based on OpenCL. It is available on GitHub and instructions for getting started can be found in the documentation. Since version 0.1 (released almost two months ago) new algorithms including unique(), search() and find_end() have been added, along with several bug fixes. See the project page on GitHub for more information: https://github.com/kylelutz/compute

## GPUPROF 0.3 Released

May 15th, 2014A new version of the GPU-profiler for CUDA software stack is available at www.lab4241.com. The GPU-profiler is able to deliver per C++ source-code ‘inside’ kernel performance information in a simple, intuitive way, similar to known CPU domain profilers, like Quantify or Valgrind. The new version, GPUPROF version 0.3 (beta), includes improved stability, refined memory tracing, temporal memory analysis, and CUDA API-driver call tracing.

## Comparative Study of Frequent Itemset Mining Techniques on Graphics Processor

May 5th, 2014Abstract:

Frequent itemset mining (FIM) is a core area for many data mining applications as association rules computation, clustering and correlations, which has been comprehensively studied over the last decades. Furthermore, databases are becoming gradually larger, thus requiring a higher computing power to mine them in reasonable time. At the same time, the improvements in high performance computing platforms are transforming them into massively parallel environments equipped with multi-core processors, such as GPUs. Hence, fully operating these systems to perform itemset mining poses as a challenging and critical problems that addressed by various researcher. We present survey of multi-core and GPU accelerated parallelization of the FIM algorithms.

(Dharmesh Bhalodiya and Chhaya patel: *“Comparative Study of Frequent Itemset Mining Techniques on Graphics Processor”*. International Journal of Engineering Research and Applications 4(4):159-163, April 2014. [PDF])

## Multi-GPU Implementation of the Minimum Volume Simplex Analysis Algorithm for Hyperspectral Unmixing

April 29th, 2014Abstract :

Spectral unmixing is an important task in remotely sensed hyperspectral data exploitation. The linear mixture model has been widely used to unmix hyperspectral images by identifying a set of pure spectral signatures, called endmembers, and estimating their respective abundances in each pixel of the scene. Several algorithms have been proposed in the recent literature to automatically identify endmembers, even if the original hyperspectral scene does not contain any pure signatures. A popular strategy for endmember identification in highly mixed hyperspectral scenes has been the minimum volume simplex analysis (MVSA), known to be a computationally very expensive algorithm. This algorithm calculates the minimum volume enclosing simplex, as opposed to other algorithms that perform maximum simplex volume analysis (MSVA). The high computational complexity of MVSA, together with its very high memory requirements, has limited its adoption in the hyperspectral imaging community. In this paper we develop several optimizations to the MVSA algorithm. The main computational task of MVSA is the solution of a quadratic optimization problem with equality and inequality constraints, with the inequality constraints being in the order of the number of pixels multiplied by the number of endmembers. As a result, storing and computing the inequality constraint matrix is highly inefficient. The first optimization presented in this paper uses algebra operations in order to reduce the memory requirements of the algorithm. In the second optimization, we use graphics processing units (GPUs) to effectively solve (in parallel) the quadratic optimization problem involved in the computation of MVSA. In the third optimization, we extend the single GPU implementation to a multi-GPU one, developing a hybrid strategy that distributes the computation while taking advantage of GPU accelerators at each node. The presented optimizations are tested in different analysis scenarios (using both synthetic and real hyperspectral data) and shown to provide state-of-the-art results from the viewpoint of unmixing accuracy and computational performance. The speedup achieved using the full GPU cluster compared to the CPU implementation in tenfold in a real hyperspectral image.

(A. Agathos, J. Li, D. Petcu and A. Plaza: *“Multi-GPU Implementation of the Minimum Volume Simplex Analysis Algorithm for Hyperspectral Unmixing”*. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, accepted for publication , 2014. [PDF] )