CfP: 23rd High Performance Computing Symposium (HPC’15)

November 14th, 2014

The 23rd High Performance Computing Symposium (HPC’15) is held in conjunction with the SCS Spring Simulation Multiconference (SpringSim’15), April 12-15, 2015, in Alexandria, VA, USA.

Topics of interest include:

  • High performance/large scale application case studies
  • GPU for general purpose computations (GPGPU)
  • Multicore and many-core computing
  • Power aware computing
  • Cloud, distributed, and grid computing
  • Asynchronous numerical methods and programming
  • Hybrid system modeling and simulation
  • Large scale visualization and data management
  • Tools and environments for coupling parallel codes
  • Parallel algorithms and architectures
  • High performance software tools
  • Resilience at the simulation level
  • Component technologies for high performance computing

More information:

PARALUTION v0.8.0 released

November 14th, 2014

PARALUTION is a library for sparse iterative methods which can be performed on various parallel devices, including multi-core CPU, GPU (CUDA and OpenCL) and Intel Xeon Phi. The new 0.8.0 release provides the following extra features:

  • Complex support
  • TNS, Variable preconditioner
  • BiCGStab(l), QMRCGStab, FCG solvers
  • RS and PairWise AMG
  • SIRA eigenvalue solver
  • Replace/Extract column/row functions
  • Stencil computation

For details, visit

Massive exploration of perturbed conditions of the blood coagulation cascade through GPU parallelization

November 3rd, 2014


The introduction of general-purpose Graphics Processing Units (GPUs) is boosting scientific applications in Bioinformatics, Systems Biology, and Computational Biology. In these fields, the use of high-performance computing solutions is motivated by the need of performing large numbers of in silico analysis to study the behavior of biological systems in different conditions, which necessitate a computing power that usually overtakes the capability of standard desktop computers. In this work we present coagSODA, a CUDA-powered computational tool that was purposely developed for the analysis of a large mechanistic model of the blood coagulation cascade (BCC), defined according to both mass-action kinetics and Hill functions. coagSODA allows the execution of parallel simulations of the dynamics of the BCC by automatically deriving the system of ordinary differential equations and then exploiting the numerical integration algorithm LSODA. We present the biological results achieved with a massive exploration of perturbed conditions of the BCC, carried out with one-dimensional and bi-dimensional parameter sweep analysis, and show that GPU-accelerated parallel simulations of this model can increase the computational performances up to a 181× speedup compared to the corresponding sequential simulations.

(Cazzaniga P., Nobile M.S., Besozzi D., Bellini M., Mauri G.: “Massive exploration of perturbed conditions of the blood coagulation cascade through GPU parallelization”. BioMed Research International, vol. 2014. [DOI])

On Demand Webinar: Essential CUDA Optimization Techniques

November 3rd, 2014

This webinar provides an overview of the improved analysis performance tools available in CUDA 6.0 and key optimization strategies for compute, latency and memory bound problems. The webinar includes techniques for ensuring peak utilization of CUDA cores, how to improve branching efficiency, intrinsic functions and loop unrolling. Optimal access patterns for global and shared memory are presented, including a comparison between the Fermi and Kepler architectures. To view the webinar go to:

CfP: Optimization of Parallel Scientific Applications with Accelerated HPC

October 29th, 2014

Since 2011, the most powerful supercomputers systems ranked in the Top500 list have been hybrid systems composed of thousands of nodes that includes CPUs and accelerators, as Xeon Phi and GPUs. Programming and deploying applications on those systems is still a challenge due to complexity of the system and the need to mix several programming interfaces (MPI, CUDA, Intel Xeon Phi) in the same application. This special issue of the International Journal of Computers & Electrical Engineering is aimed at exploring the state of the art of developing applications in accelerated massive HPC architectures, including practical issues of hybrid usage models with MPI, OpenMP, and other accelerators programming models. The idea is to publish novel work on the use of available programming interfaces (MPI, CUDA, Intel Xeon Phi) and tools for code development, application performance optimizations, application deployment on accelerated systems, as well as the advantages and limitations of accelerated HPC systems. Experiences with real-world applications, including scientific computing, numerical simulations, healthcare, energy, data-analysis, etc. are also encouraged.

Read the rest of this entry »

CfP: GPGPU 2015

October 29th, 2014

The goal of this workshop is to provide a forum to discuss new and emerging general-purpose purpose programming environments and platforms, as well as evaluate applications that have been able to harness the horsepower provided by these platforms. This year’s work is particularly interested on new heterogeneous GPU platforms, new forms of concurrency, and novel/irregular applications that can leverage these platforms. Papers are being sought on many aspects of GPUs, including (but not limited to): Read the rest of this entry »

CUDA finance course Dec 2-5, 2014, New York

October 22nd, 2014

Developed in partnership with NVIDIA, this hands-on four day course will teach you how to write and optimize applications that fully leverage the multi-core processing capabilities of the GPU. This course will have a finance focus. Commonly used algorithms such as random number generation and Monte Carlo simulations will be used and profiled in examples. A background in finance is not necessary. For more information please visit:

Cf4ocl Brings Object-Oriented API to OpenCL C API

October 22nd, 2014

The Cf4ocl project is a GPLv3/LGPLv3 initiative to provide an object-oriented interface to the OpenCL C API with integrated profiling, promoting the rapid development of OpenCL host programs and avoiding boilerplate code. Its main goal is to allow developers to focus on OpenCL device code. After two alpha releases, the first beta is out, and can be tested on Linux, Windows and OS X. The framework is independent of the OpenCL platform version and vendor, and includes utilities to simplify the analysis of the OpenCL environment and of kernel requirements. While the project is making progress, it doesn’t yet offer OpenGL/DirectX interoperability, support for sub-devices, and doesn’t support pipes and SVM.

Cf4ocl can be downloaded from

Release of OpenCLIPP 2.0: an OpenCL library for computer vision and image processing

October 16th, 2014

Version 2.0 of OpenCLIPP, an Open Source OpenCL library for computer vision and image processing primitives, bas been released. For more information about the library, for programming contributions and for download, please refer to the OpenCLIPP Website.

Approximate TF–IDF based on topic extraction from massive message stream using the GPU

October 16th, 2014


The Web is a constantly expanding global information space that includes disparate types of data and resources. Recent trends demonstrate the urgent need to manage the large amounts of data stream, especially in specific domains of application such as critical infrastructure systems, sensor networks, log file analysis, search engines and more recently, social networks. All of these applications involve large-scale data-intensive tasks, often subject to time constraints and space complexity. Algorithms, data management and data retrieval techniques must be able to process data stream, i.e., process data as it becomes available and provide an accurate response, based solely on the data stream that has already been provided. Data retrieval techniques often require traditional data storage and processing approach, i.e., all data must be available in the storage space in order to be processed. For instance, a widely used relevance measure is Term Frequency–Inverse Document Frequency (TF–IDF), which can evaluate how important a word is in a collection of documents and requires to a priori know the whole dataset.
To address this problem, we propose an approximate version of the TF–IDF measure suitable to work on continuous data stream (such as the exchange of messages, tweets and sensor-based log files). The algorithm for the calculation of this measure makes two assumptions: a fast response is required, and memory is both limited and infinitely smaller than the size of the data stream. In addition, to face the great computational power required to process massive data stream, we present also a parallel implementation of the approximate TF–IDF calculation using Graphical Processing Units (GPUs).
This implementation of the algorithm was tested on generated and real data stream and was able to capture the most frequent terms. Our results demonstrate that the approximate version of the TF–IDF measure performs at a level that is comparable to the solution of the precise TF–IDF measure.

(Ugo Erra, Sabrina Senatore, Fernando Minnella and Giuseppe Caggianese: “Approximate TF-IDF based on topic extraction from massive message stream using the GPU”, Information Sciences 292, pp.141-163, Feb. 2015. [DOI])

Page 1 of 10912345...102030...Last »