MAPS: Optimizing Massively Parallel Applications Using Device-Level Memory Abstraction

February 11th, 2015


GPUs play an increasingly important role in high-performance computing. While developing naive code is straightforward, optimizing massively parallel applications requires deep understanding of the underlying architecture. The developer must struggle with complex index calculations and manual memory transfers. This article classifies memory access patterns used in most parallel algorithms, based on Berkeley’s Parallel “Dwarfs.” It then proposes the MAPS framework, a device-level memory abstraction that facilitates memory access on GPUs, alleviating complex indexing using on-device containers and iterators. This article presents an implementation of MAPS and shows that its performance is comparable to carefully optimized implementations of real-world applications.

Rubin, Eri, et al. [“MAPS: Optimizing Massively Parallel Applications Using Device-Level Memory Abstraction.”]( ACM Transactions on Architecture and Code Optimization (TACO) 11.4 (2014): 44.

[Library website](

  • Eyal

    Cool library!!! Saved me a ton of time!