CVAILGMar 19, 2019

Trained Quantization Thresholds for Accurate and Efficient Fixed-Point Inference of Deep Neural Networks

arXiv:1903.08066v3167 citationsHas Code
AI Analysis

This addresses the need for efficient hardware deployment of neural networks, though it is incremental as it builds on prior quantization methods.

The paper tackles the problem of accurate and efficient fixed-point inference for deep neural networks by training quantization thresholds (TQT) using backpropagation, achieving near-floating-point accuracy on networks like MobileNets with less than 5 epochs of 8-bit retraining.

We propose a method of training quantization thresholds (TQT) for uniform symmetric quantizers using standard backpropagation and gradient descent. Contrary to prior work, we show that a careful analysis of the straight-through estimator for threshold gradients allows for a natural range-precision trade-off leading to better optima. Our quantizers are constrained to use power-of-2 scale-factors and per-tensor scaling of weights and activations to make it amenable for hardware implementations. We present analytical support for the general robustness of our methods and empirically validate them on various CNNs for ImageNet classification. We are able to achieve near-floating-point accuracy on traditionally difficult networks such as MobileNets with less than 5 epochs of quantized (8-bit) retraining. Finally, we present Graffitist, a framework that enables automatic quantization of TensorFlow graphs for TQT (available at https://github.com/Xilinx/graffitist ).

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes