Webinar on April 8th: Geospatial 3D Visualization in the Cloud with GPUs

April 2nd, 2014

This webinar covers how Geoweb3d uses the GPU for real-time geospatial 3D visualization, modeling, and analytics. Geoweb3D will demonstrate how native, high resolution datasets including GIS, CAD, 3D Models, LIDAR, and FMV are fused together in real-time with game quality graphics and pixel accurate analysis. The 3D engine uses a GPU resident mesh that adapts to any resolution data on the fly eliminating the need to preprocess any data prior to real-time use. Demonstration will include Geoweb3d Mobile which now uses HTML5 for use on any device in the cloud including phones and tablets.

To register follow this link: https://www2.gotomeeting.com/register/226039466

CfP: Workshop on Heterogeneous and Unconventional Cluster Architectures and Applications (HUCAA2014)

March 28th, 2014

The workshop on Heterogeneous and Unconventional Cluster Architectures and Applications, held in conjunction with ICPP 2014, September 9-12, 2014, Minneapolis, MN, USA, gears to gather recent work on heterogeneous and unconventional cluster architectures and applications, which might have a big impact on future cluster architectures. This includes any cluster architecture that is not based on the usual commodity components and therefore makes use of some special hard- or software elements, or that is used for very special and unconventional applications. In particular we call for GPUs and other accelerators (Intel MIC/Xeon Phi, FPGA) used at cluster level. Other examples include virtualization, in-memory storage, hard- and software interactions, run-times, databases, and device-to-device communication. We are in particular encouraging work on disruptive approaches, which may show inferior performance today but can already point out their performance potential. The broad scope of the workshop facilitates submissions on unconventional uses of hardware or software, gearing to gather ideas that are coming to life now and not limiting them except for their context: clusters. Also, these proposals may rather be reflective of a broader industry trend.

We are seeking new proposals presented from a holistic perspective. In this regard, one of the aims of the workshop is anticipating the evolution of clusters. Instead of just presenting new work carried out in the traditional cluster areas usually addressed in other conferences and workshops, we are thinking on creating the right atmosphere for a discussion of opportunities in cluster computing. In this regard, contributions would not only be accepted according to their technical merits but also according to their contribution to this discussion.

More information: http://www.hucaa-workshop.org/hucaa2014

New rCUDA 4.1 version available

March 26th, 2014

A new version of the rCUDA middleware has been released (version 4.1). In addition to fix some bugs related with asynchronous memory transfers, the new release provides support for:

  • CUDA 5.5 Runtime API
  • Mellanox Connect-IB network adapters
  • Dynamic Parallelism
  • cuFFT and cuBLAS libraries

The rCUDA middleware allows to seamlessly use, within your cluster, GPUs that are installed in computing nodes different from the one that is executing the CUDA application, without requiring to modify nor recompile your program. Please visit www.rcuda.net for more details about the rCUDA technology.

High-Performance Image Synthesis for Radio Interferometry

March 26th, 2014

Abstract:

A radio interferometer indirectly measures the intensity distribution of the sky over the celestial sphere. Since measurements are made over an irregularly sampled Fourier plane, synthesising an intensity image from interferometric measurements requires substantial processing. Furthermore there are distortions that have to be corrected. In this thesis, a new high-performance image synthesis tool (imaging tool) for radio interferometry is developed. Implemented in C++ and CUDA, the imaging tool achieves unprecedented performance by means of Graphics Processing Units (GPUs). The imaging tool is divided into several components, and the back-end handling numerical calculations is generalised in a new framework. A new feature termed compression arbitrarily increases the performance of an already highly efficient GPU-based implementation of the w-projection algorithm. Compression takes advantage of the behaviour of oversampled convolution functions and the baseline trajectories. A CPU-based component prepares data for the GPU which is multi-threaded to ensure maximum use of modern multi-core CPUs. Best performance can only be achieved if all hardware components in a system do work in parallel. The imaging tool is designed such that disk I/O and work on CPU and GPUs is done concurrently. Test cases show that the imaging tool performs nearly 100× faster than another general CPU-based imaging tool. Unfortunately, the tool is limited in use since deconvolution and A-projection are not yet supported. It is also limited by GPU memory. Future work will implement deconvolution and A-projection, whilst finding ways of overcoming the memory limitation.

(Daniel Muscat: “High-Performance Image Synthesis for Radio Interferometry”. Preprint, 2014. [arXiv])

cuTauLeaping: A GPU-Powered Tau-Leaping Stochastic Simulator for Massive Parallel Analyses of Biological Systems

March 26th, 2014

Abstract:

Tau-leaping is a stochastic simulation algorithm that efficiently reconstructs the temporal evolution of biological systems, modeled according to the stochastic formulation of chemical kinetics. The analysis of dynamical properties of these systems in physiological and perturbed conditions usually requires the execution of a large number of simulations, leading to high computational costs. Since each simulation can be executed independently from the others, a massive parallelization of tau-leaping can bring to relevant reductions of the overall running time. The emerging field of General Purpose Graphic Processing Units (GPGPU) provides power-efficient high-performance computing at a relatively low cost. In this work we introduce cuTauLeaping, a stochastic simulator of biological systems that makes use of GPGPU computing to execute multiple parallel tau-leaping simulations, by fully exploiting the Nvidia’s Fermi GPU architecture. We show how a considerable computational speedup is achieved on GPU by partitioning the execution of tau-leaping into multiple separated phases, and we describe how to avoid some implementation pitfalls related to the scarcity of memory resources on the GPU streaming multiprocessors. Our results show that cuTauLeaping largely outperforms the CPU-based tau-leaping implementation when the number of parallel simulations increases, with a break-even directly depending on the size of the biological system and on the complexity of its emergent dynamics. In particular, cuTauLeaping is exploited to investigate the probability distribution of bistable states in the Schlögl model, and to carry out a bidimensional parameter sweep analysis to study the oscillatory regimes in the Ras/cAMP/PKA pathway in S. cerevisiae.

(Nobile M.S., Cazzaniga P., Besozzi D., Pescini D., Mauri G.: “cuTauLeaping: A GPU-Powered Tau-Leaping Stochastic Simulator for Massive Parallel Analyses of Biological Systems”. PLoS ONE 9(3): e91963. [DOI])

GPU-Accelerated Analysis and Visualization of Large Structures Solved by Molecular Dynamics Flexible Fitting

March 26th, 2014

Abstract:

Hybrid structure fitting methods combine data from cryo-electron microscopy and X-ray crystallography with molecular dynamics simulations for the determination of all-atom structures of large biomolecular complexes. Evaluating the quality-of-fit obtained from hybrid fitting is computationally demanding, particularly in the context of a multiplicity of structural conformations that must be evaluated. Existing tools for quality-of-fit analysis and visualization have previously targeted small structures and are too slow to be used interactively for large biomolecular complexes of particular interest today such as viruses or for long molecular dynamics trajectories as they arise in protein folding. We present new data-parallel and GPU-accelerated algorithms for rapid interactive computation of quality-of-fit metrics linking all-atom structures and molecular dynamics trajectories to experimentally-determined density maps obtained from cryo-electron microscopy or X-ray crystallography. We evaluate the performance and accuracy of the new quality-of-fit analysis algorithms vis-a-vis existing tools, examine algorithm performance on GPU-accelerated desktop workstations and supercomputers, and describe new visualization techniques for results of hybrid structure fitting methods.

(John E. Stone, Ryan McGreevy, Barry Isralewitz, and Klaus Schulten: “GPU-Accelerated Analysis and Visualization of Large Structures Solved by Molecular Dynamics Flexible Fitting”. Faraday Discussion 169, 2014. [DOI])

GPU Boost on NVIDIA’s Tesla K40 GPUs

March 26th, 2014

This blog post explains GPU Boost, a new user controllable feature available on Tesla GPUs. Case studies and benchmarks for reverse time migration and an electromagnetic solver are discussed.

Efficient Acceleration of Mutual Information Computation for Nonrigid Registration Using CUDA

March 19th, 2014

Abstract:

In this paper, we propose an efficient acceleration method for the nonrigid registration of multimodal images that uses a graphics processing unit (GPU). The key contribution of our method is efficient utilization of on-chip memory for both normalized mutual information (NMI) computation and hierarchical B-spline deformation, which compose a well-known registration algorithm. We implement this registration algorithm as a compute unified device architecture (CUDA) program with an efficient parallel scheme and several optimization techniques such as hierarchical data organization, data reuse, and multiresolution representation. We experimentally evaluate our method with four clinical datasets consisting of up to 512x512x296 voxels. We find that exploitation of onchip memory achieves a 12-fold increase in speed over an off-chip memory version and, therefore, it increases the efficiency of parallel execution from 4% to 46%. We also find that our method running on a GeForce GTX 580 card is approximately 14 times faster than a fully optimized CPU-based implementation running on four cores. Some multimodal registration results are also provided to understand the limitation of our method. We believe that our highly efficient method, which completes an alignment task within a few tens of second, will be useful to realize rapid nonrigid registration.

(Kei Ikeda, Fumihiko Ino, and Kenichi Hagihara: “Efficient Acceleration of Mutual Information Computation for Nonrigid Registration Using CUDA”. Accepted for publication in the IEEE Journal of Biomedical and Health Informatics. [DOI])

CfP: 7th Workshop on UnConventional High Performance Computing 2014 (UCHPC 2014)

March 10th, 2014

The 7th UCHPC workshop will beheld in conjunction with Euro-Par 2014, August 25 – August 29, in Porto, Portugal.

Recent issues with the power consumption of conventional HPC hardware results in both new interest in accelerator hardware and in usage of mass-market hardware originally not designed for HPC. The most prominent examples are GPUs, but FPGAs, DSPs and embedded designs are also possible candidates to provide higher power efficiency, as they are used in energy-restriced environments, such as smartphones or tablets. The so-called “dark silicon” forecast, i.e. not all transistors may be active at the same time, may lead to even more specialized hardware in future mass-market products. Exploiting this hardware for HPC can be a worthwhile challenge.

Read the rest of this entry »

CfP: 2nd Workshop on Parallel and Distributed Agent-Based Simulations (PADABS 2014)

March 10th, 2014

Agent-Based Simulation Models are an increasingly popular tool for research and management in many fields such as ecology, economics and sociology. In some fields, such as social sciences, these models are seen as a key instrument to the generative approach, essential for understanding complex social phenomena. But also in policy-making, biology, military simulations, control of mobile robots and economics, the relevance and effectiveness of Agent-Based Simulation Models is recently recognized.

Several frameworks have been recently developed and are active in this field. They range from GPU-manycore approaches to parallel and/or distributed simulation environments.

The key objective of this workshop is to bring together researchers that are interested in getting more performances from their simulations by using synchronized, many-core simulations (e.g., GPUs), strongly coupled, parallel simulations (e.g. MPI) and loosely coupled, distributed simulations (distributed heterogeneous setting). More information: http://www.padabs.org/

Page 3 of 10712345...102030...Last »