Call for Posters Open for GPU Technology Conference 2014

November 4th, 2013

The Call for Posters for next year’s GTC is now open. New at GTC 2014 will be an award for best poster at the show. Don’t miss this opportunity to present the innovate work you’re doing to other top developers, researchers, and engineers. Start now by reviewing the submission guidelines and criteria and plan to submit for a chance to take home well-deserved bragging rights.

All the information you need is available on http://www.gputechconf.com/page/call-for-posters.html.

Which is faster Constant Cache or Read-only Cache? – Part Two

November 4th, 2013

One of the keys to achieving maximum performance in CUDA is taking advantage of the various memory spaces. Part II of Acceleware’s tutorial has now been published. The tutorial uses a simple encryption kernel to test and compare read-only cache, constant cache and global memory. Read the full tutorial…

CfP: High Performance Computing in Bioinformatics

November 4th, 2013

2014 International Work-Conference on Bioinformatics and Biomedical Engineering (IWBBIO 2014)
7-9 April, 2014. Granada (SPAIN). Special Session: High Performance Computing in Bioinformatics

The goal of this special session is to explore the use of emerging parallel computing architectures as well as High Performance Computing systems (Supercomputers, Clusters, Grids) for the simulation of relevant biological systems and for applications in Bioinformatics, Computational Biology and Computational Chemistry. We welcome papers, not submitted elsewhere for review, with a focus in topics of interest ranging from but not limited to: Read the rest of this entry »

Webinar: Face-in-the-crowd recognition with GPUs

November 4th, 2013

A free webinar on accelerating face-in-the-crowd recognition with GPU technology will be held on November 5th. It teaches how GPUs can be used to accelerate face detection and recognition of people in the crowd. The presentation will also cover the speakers’ use of ROS, OpenCV, OpenMP, and Armadillo libraries to develop fast reliable distributed video processing code. To register follow the link: https://www2.gotomeeting.com/register/292953058

A GPU-based Streaming Algorithm for High-Resolution Cloth Simulation

October 19th, 2013

Abstract:

We present a GPU-based streaming algorithm to perform high-resolution and accurate cloth simulation. We map all the components of cloth simulation pipeline, including time integration, collision detection, collision response, and velocity updating to GPU-based kernels and data structures. Our algorithm perform intra-object and interobject collisions, handles contacts and friction, and is able to accurately simulate folds and wrinkles. We describe the streaming pipeline and address many issues in terms of obtaining high throughput on many-core GPUs. In practice, our algorithm can perform high-fidelity simulation on a cloth mesh with 2M triangles using 3GB of GPU memory. We highlight the parallel performance of our algorithm on three different generations of GPUs. On a high-end NVIDIA Tesla K20c, we observe up to two orders of magnitude performance improvement as compared to a single-threaded CPU-based algorithm, and about one order of magnitude improvement over a 16-core CPUbased parallel implementation.

(Min Tang, Roufeng Tong, Rahul Narain, Chang Meng and Dinesh Manocha: “A GPU-based Streaming Algorithm for High-Resolution Cloth Simulation”, in the Proceedings of Pacific Graphics 2013. [WWW])

cupSODA: a CUDA-Powered Simulator of Mass-action Kinetics

October 19th, 2013

Abstract:

The computational investigation of a biological system often requires the execution of a large number of simulations to analyze its dynamics, and to derive useful knowledge on its behavior under physiological and perturbed conditions. This analysis usually turns out into very high computational costs when simulations are run on central processing units (CPUs), therefore demanding a shift to the use of high-performance processors. In this work we present a simulator of biological systems, called cupSODA, which exploits the higher memory bandwidth and computational capability of graphics processing units (GPUs). This software allows to execute parallel simulations of the dynamics of biological systems, by first deriving a set of ordinary differential equations from reaction-based mechanistic models defined according to the mass-action kinetics, and then exploiting the numerical integration algorithm LSODA. We show that cupSODA can achieve a 112× speedup on GPUs with respect to equivalent executions of LSODA on CPUs.

(Nobile M.S., Besozzi D., Cazzaniga P., Mauri G., Pescini D.: “cupSODA: a CUDA-Powered Simulator of Mass-action Kinetics”,  In 12th International Conference on Parallel Computing Technologies (PaCT), Lecture Notes in Computer Science, volume 7979, pp. 344-357, 2013. [DOI])

Performance benchmarks on CPU/GPU/Xeon Phi

October 19th, 2013

PARALUTION is a library for sparse iterative methods which can be performed on various parallel devices, including multi-core CPU and GPU. In the new 0.4.0 version, the library provides also a backend for Xeon Phi (MIC). With this new version, various performance benchmarks based on vector-vector routines, sparse matrix-vector multiplication and CG method on different backends have been released: OpenMP/CUDA/OpenCL- NVIDIA GPU, AMD GPU, CPU and Xeon Phi. More information: http://www.paralution.com/benchmarks/

Influence of InfiniBand FDR on the Performance of Remote GPU Virtualization

October 7th, 2013

Abstract:

The use of GPUs to accelerate general-purpose scientific and engineering applications is mainstream today, but their adoption in current high-performance computing clusters is impaired primarily by acquisition costs and power consumption. Therefore, the benefits of sharing a reduced number of GPUs among all the nodes of a cluster can be remarkable for many applications. This approach, usually referred to as remote GPU virtualization, aims at reducing the number of GPUs present in a cluster, while increasing their utilization rate. The performance of the interconnection network is key to achieving reasonable performance results by means of remote GPU virtualization. To this end, several networking technologies with throughput comparable to that of PCI Express have appeared recently. In this paper we analyze the influence of InfiniBand FDR on the performance of remote GPU virtualization, comparing its impact on a variety of GPU-accelerated applications with other networking technologies, such as InfiniBand QDR and Gigabit Ethernet. Given the severe limitations of freely available remote GPU virtualization solutions, the rCUDA framework is used as the case study for this analysis. Results show that the new FDR interconnect, featuring higher bandwidth than its predecessors, allows the reduction of the overhead of using GPUs remotely, thus making this approach even more appealing.

(Carlos Reano, Rafael Mayo, Enrique S. Quintana-Ortí, Federico Silla, José Duato and Antonio J. Pena: “Influence of InfiniBand FDR on the Performance of Remote GPU Virtualization”. Proceedings of the IEEE Cluster 2013 Conference, Indianapolis, USA, September 2013. [PDF])

A High Performance Copy-Move Image Forgery Detection Scheme on GPU

October 7th, 2013

Abstract:

This paper presents an accelerated version of copy-move image forgery detection scheme on the Graphics Processing Units or GPUs. With the replacement of analog cameras with their digital counterparts and availability of powerful image processing software packages, authentication of digital images has gained importance in the recent past. This paper focuses on improving the performance of a copy-move forgery detection scheme based on radix sort by porting it onto the GPUs. This scheme has enhanced performance and is much more efficient compared to other methods without degradation of detection results. The CPU version of the radix-sort based detection scheme was developed in Matlab and critical sections of the CPU version were coded in C-language using Matlab’s Mex interface to get the maximum performance. The GPU version was developed using Jacket GPU Engine for Matlab and performs over twelve times faster than its optimized CPU variant. The contribution this paper makes towards blind image forensics is the use of integral images for computing feature vectors of overlapping blocks in block-matching technique and acceleration of the entire copy-move forgery detection scheme on the GPUs, not found in literature.

(Jaideep Singh and Balasubramanian Raman, “A High Performance Copy-Move Image Forgery Detection Scheme on GPU”, Advances in Intelligent and Soft Computing Volume 131, 2012, pp 239-246, Proceedings of the International Conference on Soft Computing for Problem Solving (SocProS 2011). [DOI])

1000th GPGPU Post!

September 30th, 2013

The GPGPU.org editors (Mark Harris and Dominik Goeddeke) were excited to notice that our previous post was the 1000th post on GPGPU.org! GPGPU continues to grow as fast as ever, and we’re excited to see what it brings in the future.

We’d like to use this opportunity to remind you that GPGPU.org relies on our readers to submit news. So please, visit http://gpgpu.org/submit-news/ and tell us about your project, publications, or the work of others you think we should share.

Thanks for an amazing 11 years and hundreds of news submissions!

Page 5 of 105« First...34567...102030...Last »