LGMLDec 15, 2017

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

arXiv:1712.05877v14280 citations
Originality Highly original
AI Analysis

This addresses the need for efficient inference on mobile devices, offering a practical solution for deployment on integer-only hardware.

The paper tackles the problem of efficient on-device inference for deep learning models by proposing a quantization scheme that enables integer-only arithmetic, co-designed with a training procedure to maintain accuracy. The result is improved accuracy-latency tradeoffs, demonstrated on MobileNets with ImageNet classification and COCO detection on CPUs.

The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be carried out using integer-only arithmetic, which can be implemented more efficiently than floating point inference on commonly available integer-only hardware. We also co-design a training procedure to preserve end-to-end model accuracy post quantization. As a result, the proposed quantization scheme improves the tradeoff between accuracy and on-device latency. The improvements are significant even on MobileNets, a model family known for run-time efficiency, and are demonstrated in ImageNet classification and COCO detection on popular CPUs.

Code Implementations21 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes