Webinar, May 20th: Accelerating GIS Big Data Processing

May 18th, 2014

Join the free webinar on May 20th devoted to accelerating orthorectification, atmospheric correction, and transformations for big data with GPUs. Learn how GPU capabilities can improve time for processing large imagery 50-100 times faster. Amanda O’Connor, a Senior Solutions Engineer at Exelis will walk you through implementation of GPU processing for large imagery datasets, operational use of GPU processing for orthorectification and share benchmarks against desktop algorithms. To register follow this link: https://www2.gotomeeting.com/register/665929994.

Boost.Compute v0.2 Released

May 15th, 2014

Boost.Compute v0.2 has been released! Boost.Compute is a header-only C++ library for GPGPU and parallel-computing based on OpenCL. It is available on GitHub and instructions for getting started can be found in the documentation. Since version 0.1 (released almost two months ago) new algorithms including unique(), search() and find_end() have been added, along with several bug fixes. See the project page on GitHub for more information: https://github.com/kylelutz/compute

GPUPROF 0.3 Released

May 15th, 2014

A new version of the GPU-profiler for CUDA software stack is available at www.lab4241.com. The GPU-profiler is able to deliver per C++ source-code ‘inside’ kernel performance information in a simple, intuitive way, similar to known CPU domain profilers, like Quantify or Valgrind. The new version, GPUPROF version 0.3 (beta), includes improved stability, refined memory tracing, temporal memory analysis, and CUDA API-driver call tracing.

Comparative Study of Frequent Itemset Mining Techniques on Graphics Processor

May 5th, 2014


Frequent itemset mining (FIM) is a core area for many data mining applications as association rules computation, clustering and correlations, which has been comprehensively studied over the last decades. Furthermore, databases are becoming gradually larger, thus requiring a higher computing power to mine them in reasonable time. At the same time, the improvements in high performance computing platforms are transforming them into massively parallel environments equipped with multi-core processors, such as GPUs. Hence, fully operating these systems to perform itemset mining poses as a challenging and critical problems that addressed by various researcher. We present survey of multi-core and GPU accelerated parallelization of the FIM algorithms.

(Dharmesh Bhalodiya and Chhaya patel:  “Comparative Study of Frequent Itemset Mining Techniques on Graphics Processor”. International Journal of Engineering Research and Applications 4(4):159-163, April 2014. [PDF])

Multi-GPU Implementation of the Minimum Volume Simplex Analysis Algorithm for Hyperspectral Unmixing

April 29th, 2014

Abstract :

Spectral unmixing is an important task in remotely sensed hyperspectral data exploitation. The linear mixture model has been widely used to unmix hyperspectral images by identifying a set of pure spectral signatures, called endmembers, and estimating their respective abundances in each pixel of the scene. Several algorithms have been proposed in the recent literature to automatically identify endmembers, even if the original hyperspectral scene does not contain any pure signatures. A popular strategy for endmember identification in highly mixed hyperspectral scenes has been the minimum volume simplex analysis (MVSA), known to be a computationally very expensive algorithm. This algorithm calculates the minimum volume enclosing simplex, as opposed to other algorithms that perform maximum simplex volume analysis (MSVA). The high computational complexity of MVSA, together with its very high memory requirements, has limited its adoption in the hyperspectral imaging community. In this paper we develop several optimizations to the MVSA algorithm. The main computational task of MVSA is the solution of a quadratic optimization problem with equality and inequality constraints, with the inequality constraints being in the order of the number of pixels multiplied by the number of endmembers. As a result, storing and computing the inequality constraint matrix is highly inefficient. The first optimization presented in this paper uses algebra operations in order to reduce the memory requirements of the algorithm. In the second optimization, we use graphics processing units (GPUs) to effectively solve (in parallel) the quadratic optimization problem involved in the computation of MVSA. In the third optimization, we extend the single GPU implementation to a multi-GPU one, developing a hybrid strategy that distributes the computation while taking advantage of GPU accelerators at each node. The presented optimizations are tested in different analysis scenarios (using both synthetic and real hyperspectral data) and shown to provide state-of-the-art results from the viewpoint of unmixing accuracy and computational performance. The speedup achieved using the full GPU cluster compared to the CPU implementation in tenfold in a real hyperspectral image.

(A. Agathos, J. Li, D. Petcu and A. Plaza: “Multi-GPU Implementation of the Minimum Volume Simplex Analysis Algorithm for Hyperspectral Unmixing”. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, accepted for publication , 2014. [PDF] )

A high throughput efficient approach for decoding LDPC codes onto GPU devices

April 16th, 2014


LDPC decoding process is known as compute intensive. This kind of digital communication applications was recently implemented onto GPU devices for LDPC code performance estimation and/or for real-time measurements. Overall previous studies about LDPC decoding on GPU were based on the implementation of the flooding-based decoding algorithm that provides massive computation parallelism. More efficient layered schedules were proposed in literature because decoder iteration can be split into sub-layer iterations. These schedules seem to badly fit onto GPU devices due to restricted computation parallelism and complex memory access patterns. However, the layered schedules enable the decoding convergence to speed up by two. In this letter, we show that (a) layered schedule can be efficiently implemented onto a GPU device (b) this approach – implemented onto a low-cost GPU device – provides higher throughputs with identical correction performances (BER) compared to previously published results.

(B. Le Gal, C. Jégo and J. Crenne: “An high-throughput efficiency approach for GPU-based LDPC decoding”. IEEE Embedded System Letters, March 2014. [DOI])

Workshop on GPU Programming for Molecular Modeling, Urbana, IL, July 22-24, 2014

April 16th, 2014

The GPU Programming for Molecular Modeling workshop will extend GPU programming techniques to the field of molecular modeling, including subjects such as particle-grid algorithms (electrostatics, molecular surfaces, density maps, and molecular orbitals), particle-particle algorithms with an emphasis on non-bonded force calculations, radial distribution functions in GPU histogramming, single-node multi-GPU algorithms, and GPU clusters. Specific examples utilizing the NAMD and VMD software programs will be introduced and discussed in detail. The workshop is designed for researchers in computational and/or biophysical fields who seek to extend their GPU programming skills to include molecular modeling. Advanced lecture sessions will be followed by extended discussion periods between lecturers and participants and laboratory time in which students will be able to work on their own molecular modeling GPU codes. See workshop website for details and application: http://www.ks.uiuc.edu/Training/Workshop/GPU_Jul2014/

Efficient Multi-GPU Computation of All-Pairs Shortest Paths

April 2nd, 2014


We describe a new algorithm for solving the all-pairs shortest-path (APSP) problem for planar graphs and graphs with small separators that exploits the massive on-chip parallelism available in today’s Graphics Processing Units (GPUs). Our algorithm, based on the Floyd-Warshall algorithm, has near optimal complexity in terms of the total number of operations, while its matrix-based structure is regular enough to allow for efficient parallel implementation on the GPUs. By applying a divide-and-conquer approach, we are able to make use of multi-node GPU clusters, resulting in more than an order of magnitude speedup over the fastest known Dijkstra-based GPU implementation and a two-fold speedup over a parallel Dijkstra-based CPU implementation.

(Hristo Djidjev, Sunil Thulasidasan, Guillaume Chapuis, Rumen Andonov and Dominique Lavenier: “Efficient Multi-GPU Computation of All-Pairs Shortest Paths”. To appear in IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2014. [PDF])

Webinar on April 8th: Geospatial 3D Visualization in the Cloud with GPUs

April 2nd, 2014

This webinar covers how Geoweb3d uses the GPU for real-time geospatial 3D visualization, modeling, and analytics. Geoweb3D will demonstrate how native, high resolution datasets including GIS, CAD, 3D Models, LIDAR, and FMV are fused together in real-time with game quality graphics and pixel accurate analysis. The 3D engine uses a GPU resident mesh that adapts to any resolution data on the fly eliminating the need to preprocess any data prior to real-time use. Demonstration will include Geoweb3d Mobile which now uses HTML5 for use on any device in the cloud including phones and tablets.

To register follow this link: https://www2.gotomeeting.com/register/226039466

CfP: Workshop on Heterogeneous and Unconventional Cluster Architectures and Applications (HUCAA2014)

March 28th, 2014

The workshop on Heterogeneous and Unconventional Cluster Architectures and Applications, held in conjunction with ICPP 2014, September 9-12, 2014, Minneapolis, MN, USA, gears to gather recent work on heterogeneous and unconventional cluster architectures and applications, which might have a big impact on future cluster architectures. This includes any cluster architecture that is not based on the usual commodity components and therefore makes use of some special hard- or software elements, or that is used for very special and unconventional applications. In particular we call for GPUs and other accelerators (Intel MIC/Xeon Phi, FPGA) used at cluster level. Other examples include virtualization, in-memory storage, hard- and software interactions, run-times, databases, and device-to-device communication. We are in particular encouraging work on disruptive approaches, which may show inferior performance today but can already point out their performance potential. The broad scope of the workshop facilitates submissions on unconventional uses of hardware or software, gearing to gather ideas that are coming to life now and not limiting them except for their context: clusters. Also, these proposals may rather be reflective of a broader industry trend.

We are seeking new proposals presented from a holistic perspective. In this regard, one of the aims of the workshop is anticipating the evolution of clusters. Instead of just presenting new work carried out in the traditional cluster areas usually addressed in other conferences and workshops, we are thinking on creating the right atmosphere for a discussion of opportunities in cluster computing. In this regard, contributions would not only be accepted according to their technical merits but also according to their contribution to this discussion.

More information: http://www.hucaa-workshop.org/hucaa2014

Page 4 of 109« First...23456...102030...Last »