NECVLGJan 28, 2019

A Simple Method to Reduce Off-chip Memory Accesses on Convolutional Neural Networks

arXiv:1901.09614v15 citations
Originality Incremental advance
AI Analysis

This work addresses efficiency issues for hardware implementations of neural networks, particularly in mobile or embedded systems, but is incremental as it builds on existing memory optimization techniques.

The authors tackled the problem of excessive off-chip memory accesses in convolutional neural networks by proposing a simple algorithm that maximizes on-chip memory usage in a neural processing unit, resulting in a 1/50 reduction in off-chip memory accesses and a 97.59% reduction in feature-map data transfer for Inception-V3 on Samsung's NPU.

For convolutional neural networks, a simple algorithm to reduce off-chip memory accesses is proposed by maximally utilizing on-chip memory in a neural process unit. Especially, the algorithm provides an effective way to process a module which consists of multiple branches and a merge layer. For Inception-V3 on Samsung's NPU in Exynos, our evaluation shows that the proposed algorithm makes off-chip memory accesses reduced by 1/50, and accordingly achieves 97.59 % reduction in the amount of feature-map data to be transferred from/to off-chip memory.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes