ARDCLGNENov 15, 2019

NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units

arXiv:1911.06859v131 citations
Originality Incremental advance
AI Analysis

This addresses memory management inefficiencies for NPU designers and users, representing an incremental improvement over GPU-centric schemes.

The paper tackles the problem of inefficient address translation in neural processing units (NPUs) by proposing a tailored memory management unit (MMU), achieving an average performance overhead of only 0.06% compared to an oracular design.

To satisfy the compute and memory demands of deep neural networks, neural processing units (NPUs) are widely being utilized for accelerating deep learning algorithms. Similar to how GPUs have evolved from a slave device into a mainstream processor architecture, it is likely that NPUs will become first class citizens in this fast-evolving heterogeneous architecture space. This paper makes a case for enabling address translation in NPUs to decouple the virtual and physical memory address space. Through a careful data-driven application characterization study, we root-cause several limitations of prior GPU-centric address translation schemes and propose a memory management unit (MMU) that is tailored for NPUs. Compared to an oracular MMU design point, our proposal incurs only an average 0.06% performance overhead.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes