IVLGJul 16, 2022

Learnable Mixed-precision and Dimension Reduction Co-design for Low-storage Activation

arXiv:2207.07931v26 citationsh-index: 13
AI Analysis

This work addresses memory constraints for deploying CNNs on resource-constrained edge devices, representing an incremental advance in activation compression techniques.

The paper tackles the problem of reducing memory bandwidth for activation data in CNNs on edge devices by proposing a learnable co-design system that combines mixed-precision and dimension reduction, achieving accuracy improvements of 3.54% on ResNet18 and 1.27% on MobileNetv2 while saving bits per value.

Recently, deep convolutional neural networks (CNNs) have achieved many eye-catching results. However, deploying CNNs on resource-constrained edge devices is constrained by limited memory bandwidth for transmitting large intermediated data during inference, i.e., activation. Existing research utilizes mixed-precision and dimension reduction to reduce computational complexity but pays less attention to its application for activation compression. To further exploit the redundancy in activation, we propose a learnable mixed-precision and dimension reduction co-design system, which separates channels into groups and allocates specific compression policies according to their importance. In addition, the proposed dynamic searching technique enlarges search space and finds out the optimal bit-width allocation automatically. Our experimental results show that the proposed methods improve 3.54%/1.27% in accuracy and save 0.18/2.02 bits per value over existing mixed-precision methods on ResNet18 and MobileNetv2, respectively.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes