Fine-grained GPU implementation of Assembly-Free Iterative Solver for Finite Element Problems

June 11th, 2015


The use of GPU computing in FEA is today an active research field. This is primary due to current GPU sparse solvers are partially parallelizable and can hardly make use of Data-Level Parallelism (DLP) for which GPU architectures are designed. This paper proposes a fine-grained implementation of matrix-free Conjugate Gradient (CG) solver for Finite Element Analysis (FEA) using Graphics Processing Unit (GPU) architectures. The proposed GPU instance takes advantage of Massively Parallel Processing (MPP) architectures performing well-balanced parallel calculations at the Degree-of-Freedom (DoF) level of finite elements. The numerical experiments evaluate and analyze the performance of diverse GPU instances of the matrix-free CG solver.

Jesús Martínez-Frutos, Pedro J. Martínez-Castejón, David Herrero-Pérez, Fine-grained GPU implementation of assembly-free iterative solver for finite element problems, Computers & Structures, Volume 157, September 2015, Pages 9-18, ISSN 0045-7949,


A Survey Of Architectural Approaches for Data Compression in Cache and Main Memory Systems

May 18th, 2015


As the number of cores on a chip increase and key applications become even more data-intensive, memory systems in modern processors have to deal with increasingly large amount of data. In face of such challenges, data compression presents as a promising approach to increase effective memory system capacity and also provide performance and energy advantages. This paper presents a survey of techniques for using compression in cache and main memory systems. It also classifies the techniques based on key parameters to highlight their similarities and differences. It discusses compression in CPUs and GPUs, conventional and non-volatile memory (NVM) systems, and 2D and 3D memory systems. We hope that this survey will help the researchers in gaining insight into the potential role of compression approach in memory components of future extreme-scale systems.

Sparsh Mittal and Jeffrey Vetter, “A Survey Of Architectural Approaches for Data Compression in Cache and Main Memory Systems”, IEEE TPDS 2015. WWW

A Survey of CPU-GPU Heterogeneous Computing Techniques

May 18th, 2015


As both CPU and GPU become employed in a wide range of applications, it has been acknowledged that both of these processing units (PUs) have their unique features and strengths and hence, CPU-GPU collaboration is inevitable to achieve high-performance computing. This has motivated significant amount of research on heterogeneous computing techniques, along with the design of CPU-GPU fused chips and petascale heterogeneous supercomputers. In this paper, we survey heterogeneous computing techniques (HCTs) such as workload-partitioning which enable utilizing both CPU and GPU to improve performance and/or energy efficiency. We review heterogeneous computing approaches at runtime, algorithm, programming, compiler and application level. Further, we review both discrete and fused CPU-GPU systems; and discuss benchmark suites designed for evaluating heterogeneous computing systems (HCSs). We believe that this paper will provide insights into working and scope of applications of HCTs to researchers and motivate them to further harness the computational powers of CPUs and GPUs to achieve the goal of exascale performance.

Sparsh Mittal and Jeffrey Vetter, “A Survey of CPU-GPU Heterogeneous Computing Techniques”, accepted in ACM Computing Surveys, 2015. WWW

RapidCFD: open-source CFD for GPUs

April 13th, 2015

A new open-source CFD project have just been published. RapidCFD is a new open-source CFD project that uses NVIDIA CUDA for the entire calculation process which gives a significant reduction in computation time.


  • most incompressible and compressible solvers on static mesh are available
  • all the calculations are done on the GPU
  • no overhead for GPU-CPU memory copy
  • can run in parallel on multiple GPUs

Visit RapidCFD project page.

A Survey Of Techniques for Managing and Leveraging Caches in GPUs

February 10th, 2015


Initially introduced as special-purpose accelerators for graphics applications, graphics processing units (GPUs) have now emerged as general purpose computing platforms for a wide range of applications. To address the requirements of these applications, modern GPUs include sizable hardware-managed caches. However, several factors, such as unique architecture of GPU, rise of CPU-GPU heterogeneous computing, etc., demand effective management of caches to achieve high performance and energy efficiency. Recently, several techniques have been proposed for this purpose. In this paper, we survey several architectural and system-level techniques proposed for managing and leveraging GPU caches. We also discuss the importance and challenges of cache management in GPUs. The aim of this paper is to provide the readers insights into cache management techniques for GPUs and motivate them to propose even better techniques for leveraging the full potential of caches in the GPUs of tomorrow.

Sparsh Mittal, “A Survey Of Techniques for Managing and Leveraging Caches in GPUs”, Journal of Circuits, Systems, and Computers (JCSC), vol. 23, no. 8, 2014. WWW

A Survey of Methods for Analyzing and Improving GPU Energy Efficiency

February 10th, 2015


Recent years have witnessed a phenomenal growth in the computational capabilities and applications of GPUs. However, this trend has also led to dramatic increase in their power consumption. This paper surveys research works on analyzing and improving energy efficiency of GPUs. It also provides a classification of these techniques on the basis of their main research idea. Further, it attempts to synthesize research works which compare energy efficiency of GPUs with other computing systems, e.g. FPGAs and CPUs. The aim of this survey is to provide researchers with knowledge of state-of-the-art in GPU power management and motivate them to architect highly energy-efficient GPUs of tomorrow.

Sparsh Mittal, Jeffrey S Vetter, “A Survey of Methods for Analyzing and Improving GPU Energy Efficiency”, in ACM Computing Surveys, vol. 47, no. 2, pp. 19:1-19:23, 2014. [WWW]

CfP: High Performance Computing Symposium

November 8th, 2012

The 21st High Performance Computing Symposium (HPC 2013), devoted to the impact of high performance computing and communications on computer simulations. Advances in multicore and many-core architectures, networking, high end computers, large data stores, and middleware capabilities are ushering in a new era of high performance parallel and distributed simulations. Along with these new capabilities come new challenges in computing and system modeling. The goal of HPC 2013 is to encourage innovation in high performance computing
and communication technologies and to promote synergistic advances in modeling methodologies and simulation. It will promote the exchange of ideas and information between universities, industry, and national laboratories about new developments in system modeling, high performance computing and communication, and scientific computing and simulation. Read the rest of this entry »

Symposium on Personal High-Performance Computing

September 20th, 2012

The Vrije Universiteit Brussel, Erasmus Hogeschool Brussel and Lessius Hogeschool have the pleasure to invite you to a symposium on Personal High-Performance Computing. The symposium aims at bringing together academia and industry to discuss recent advances in using accelerators such as GPUs or FPGAs for speeding up computational-intensive applications. We target single systems such as PCs, laptops or processor boards, hence the name ‘personal’ HPC.

Scientists are encouraged to submit abstracts to be presented at the poster session. All information can be found at

CfP: 3rd International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems (PMBS12)

August 11th, 2012

This workshop is concerned with the comparison of high-performance computing systems through performance modeling, benchmarking or the use of tools such as simulators. We are particularly interested in research which reports the ability to measure and make tradeoffs in software/hardware co-design to improve sustained application performance. We are also keen to capture the assessment of future systems, for example through work that ensures continued application scalability through peta- and exa-scale systems.

Read the rest of this entry »

5th Workshop on UnConventional High Performance Computing 2012

June 3rd, 2012

Together with EuroPar-12, the 5th Workshop on UnConventional High Performance Computing 2012 (UCHPC 2012) will take place on August 27/28 at Rhodes Island, Greece. The workshop tries to capture solutions for HPC which are unconventional today but could become conventional and significant tomorrow. While GPGPU is already used a lot in HPC, there still are all kind of issues around best exploitation and productivity for the programmer. Submission deadline: June 6, 2012. For more details, see UPDATE: Submission deadline extended to June 11.

