Recent trends of aggressive technology scaling have greatly exacerbated the occurrences and impact of faults in computing systems. This has made `reliability’ a first-order design constraint. To address the challenges of reliability, several techniques have been proposed. This paper provides a survey of architectural techniques for improving resilience of computing systems. We especially focus on techniques proposed for microarchitectural components, such as processor registers, functional units, cache and main memory etc. In addition, we discuss techniques proposed for non-volatile memory (NVM), GPUs and 3D-stacked processors. To underscore the similarities and differences of the techniques, we classify them based on their key characteristics. We also review the metrics proposed to quantify vulnerability of processor structures. We believe that this survey will help researchers, system-architects and processor designers in gaining insights into the techniques for improving reliability of computing systems.
Sparsh Mittal, Jeffrey S Vetter, “A Survey of Techniques for Modeling and Improving Reliability of Computing Systems”, in IEEE TPDS, 2015. WWW
To minimize interference in LTE networks, several inter-cell interference coordination (ICIC) techniques have been introduced. Among them, semi-static ICIC offers a balanced trade-off between applicability and system performance. The power allocation per resource block and cell is adapted in the range of seconds according to the load in the system. An open issue in the literature is the question how fast the adaptation should be performed. This leads basically to a trade-off between system performance and feasible computation times of the associated power allocation problems. In this work, we close this open issue by studying the impact that different durations of update times of semi-static ICIC have on the system performance. We conduct our study on realistic scenarios considering also the mobility of mobile terminals. Secondly, we also consider the implementation aspects of a semi-static ICIC. We introduce a very efficient implementation on general purpose graphic processing units, harnessing the parallel computing capability of such devices. We show that the update periods have a significant impact on the performance of cell edge terminals. Additionally, we present a graphic processing unit (GPU) based implementation which speeds up existing implementations up to a factor of 92x.
Parruca, Donald and Aizaz, Fahad and Chantaraskul, Soamsiri and Gross, James. “Semi-static Interference Coordination in OFDMA/LTE Networks: Evaluation of Practical Aspects. In Proceedings of the 17th ACM International Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems, pp 87-94 2014.
We introduce a practical partitioning technique designed for parallelizing Position Based Dynamics, and exploiting the ubiquitous multi-core processors present in current commodity GPUs. The input is a set of particles whose dynamics is influenced by spatial constraints. In the initialization phase, we build a graph in which each node corresponds to a constraint and two constraints are connected by an edge if they influence at least one common particle. We introduce a novel greedy algorithm for inserting additional constraints (phantoms) in the graph such that the resulting topology is qˆ-colourable, where qˆ ≥ 2 is an arbitrary number. We color the graph, and the constraints with the same color are assigned to the same partition. Then, the set of constraints belonging to each partition is solved in parallel during the animation phase. We demonstrate this by using our partitioning technique; the performance hit caused by the GPU kernel calls is significantly decreased, leaving unaffected the visual quality, robustness and speed of serial position based dynamics.
(Fratarcangeli M and Pellacini F, Scalable Partitioning for Parallel Position Based Dynamics, Computer Graphics Forum (Special Issue of Eurographics 2015 Conference). Vol. 34(2) 2015)
Recent technological advances have greatly improved the performance and features of embedded systems. With the number of just mobile devices now reaching nearly equal to the population of earth, embedded systems have truly become ubiquitous. These trends, however, have also made the task of managing their power consumption extremely challenging. In recent years, several techniques have been proposed to address this issue. In this paper, we survey the techniques for managing power consumption of embedded systems. We discuss the need of power management and provide a classification of the techniques on several important parameters to highlight their similarities and differences. This paper also reviews those techniques which use GPU and FPGA to improve energy efficiency of embedded systems. This paper is intended to help the researchers and application-developers in gaining insights into the working of power management techniques and designing even more efficient high-performance embedded systems of tomorrow.
Sparsh Mittal, “A Survey of Techniques For Improving Energy Efficiency in Embedded Computing Systems”, International Journal of Computer Aided Engineering and Technology (IJCAET), vol 6, no. 4, 2014. WWW
GPUs play an increasingly important role in high-performance computing. While developing naive code is straightforward, optimizing massively parallel applications requires deep understanding of the underlying architecture. The developer must struggle with complex index calculations and manual memory transfers. This article classifies memory access patterns used in most parallel algorithms, based on Berkeley’s Parallel “Dwarfs.” It then proposes the MAPS framework, a device-level memory abstraction that facilitates memory access on GPUs, alleviating complex indexing using on-device containers and iterators. This article presents an implementation of MAPS and shows that its performance is comparable to carefully optimized implementations of real-world applications.
Rubin, Eri, et al. ["MAPS: Optimizing Massively Parallel Applications Using Device-Level Memory Abstraction."](http://dl.acm.org/citation.cfm?id=2680544) ACM Transactions on Architecture and Code Optimization (TACO) 11.4 (2014): 44.
The cellular process responsible for providing energy for most life on Earth, namely, photosynthetic light-harvesting, requires the cooperation of hundreds of proteins across an organelle, involving length and time scales spanning several orders of magnitude over quantum and classical regimes. Simulation and visualization of this fundamental energy conversion process pose many unique methodological and computational challenges. We present, in an accompanying movie, light-harvesting in the photosynthetic apparatus found in purple bacteria, the so-called chromatophore. The movie is the culmination of three decades of modeling efforts, featuring the collaboration of theoretical, experimental, and computational scientists. We describe the techniques that were used to build, simulate, analyze, and visualize the structures shown in the movie, and we highlight cases where scientific needs spurred the development of new parallel algorithms that efficiently harness GPU accelerators and petascale computers.
Visualization of Energy Conversion Processes in a Light Harvesting Organelle at Atomic Detail. M. Sener, J. E. Stone, A. Barragan, A. Singharoy, I. Teo, K. L. Vandivort, B. Isralewitz, B. Liu, B. Goh, J. C. Phillips, L. F. Kourkoutis, C. N. Hunter, and K. Schulten. SC’14 Visualization and Data Analytics Showcase, 2014. Paper PDF
Initially introduced as special-purpose accelerators for graphics applications, graphics processing units (GPUs) have now emerged as general purpose computing platforms for a wide range of applications. To address the requirements of these applications, modern GPUs include sizable hardware-managed caches. However, several factors, such as unique architecture of GPU, rise of CPU-GPU heterogeneous computing, etc., demand effective management of caches to achieve high performance and energy efficiency. Recently, several techniques have been proposed for this purpose. In this paper, we survey several architectural and system-level techniques proposed for managing and leveraging GPU caches. We also discuss the importance and challenges of cache management in GPUs. The aim of this paper is to provide the readers insights into cache management techniques for GPUs and motivate them to propose even better techniques for leveraging the full potential of caches in the GPUs of tomorrow.
Sparsh Mittal, “A Survey Of Techniques for Managing and Leveraging Caches in GPUs”, Journal of Circuits, Systems, and Computers (JCSC), vol. 23, no. 8, 2014. WWW
Recent years have witnessed a phenomenal growth in the computational capabilities and applications of GPUs. However, this trend has also led to dramatic increase in their power consumption. This paper surveys research works on analyzing and improving energy efficiency of GPUs. It also provides a classification of these techniques on the basis of their main research idea. Further, it attempts to synthesize research works which compare energy efficiency of GPUs with other computing systems, e.g. FPGAs and CPUs. The aim of this survey is to provide researchers with knowledge of state-of-the-art in GPU power management and motivate them to architect highly energy-efficient GPUs of tomorrow.
Sparsh Mittal, Jeffrey S Vetter, “A Survey of Methods for Analyzing and Improving GPU Energy Efficiency”, in ACM Computing Surveys, vol. 47, no. 2, pp. 19:1-19:23, 2014. [WWW]
The wide majority of current state-of-the-art compressed GPU volume renderers are based on block-transform coding, which is susceptible to blocking artifacts, particularly at low bit-rates. In this paper the authors address the problem for the first time, by introducing a specialized deferred filtering architecture working on block-compressed data and including a novel deblocking algorithm. The architecture efficiently performs high quality shading of massive datasets by closely coordinating visibility- and resolution-aware adaptive data loading with GPU-accelerated per-frame data decompression, deblocking, and rendering. A thorough evaluation including quantitative and qualitative measures demonstrates the performance of our approach on large static and dynamic datasets including a massive 512^4 turbulence simulation (256GB), which is aggressively compressed to less than 2 GB, so as to fully upload it on graphics board and to explore it in real-time during animation.
(Fabio Marton, José Antonio Iglesias Guitián, Jose Díaz and Enrico Gobbetti: “Real-time deblocked GPU rendering of compressed volumes”. Proc. 19th International Workshop on Vision, Modeling and Visualization (VMV), pp. 167-174, Oct. 2014. [WWW])
The 23rd High Performance Computing Symposium (HPC’15) is held in conjunction with the SCS Spring Simulation Multiconference (SpringSim’15), April 12-15, 2015, in Alexandria, VA, USA.
Topics of interest include:
- High performance/large scale application case studies
- GPU for general purpose computations (GPGPU)
- Multicore and many-core computing
- Power aware computing
- Cloud, distributed, and grid computing
- Asynchronous numerical methods and programming
- Hybrid system modeling and simulation
- Large scale visualization and data management
- Tools and environments for coupling parallel codes
- Parallel algorithms and architectures
- High performance software tools
- Resilience at the simulation level
- Component technologies for high performance computing
More information: http://hosting.cs.vt.edu/hpc2015.