LGMLMar 4, 2019

Learning low-precision neural networks without Straight-Through Estimator(STE)

arXiv:1903.01061v239 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of quantizing neural networks for efficient deployment, offering a method that avoids STE approximations and provides incremental accuracy gains.

The paper tackles the problem of training low-precision neural networks by proposing alpha-blending (AB) as an alternative to the Straight-Through Estimator (STE), which lacks theoretical understanding, and achieves improvements in top-1 accuracy of 0.9%, 0.82%, and 2.93% on various models and datasets compared to STE-based quantization.

The Straight-Through Estimator (STE) is widely used for back-propagating gradients through the quantization function, but the STE technique lacks a complete theoretical understanding. We propose an alternative methodology called alpha-blending (AB), which quantizes neural networks to low-precision using stochastic gradient descent (SGD). Our method (AB) avoids STE approximation by replacing the quantized weight in the loss function by an affine combination of the quantized weight w_q and the corresponding full-precision weight w with non-trainable scalar coefficient $α$ and $1-α$. During training, $α$ is gradually increased from 0 to 1; the gradient updates to the weights are through the full-precision term, $(1-α)w$, of the affine combination; the model is converted from full-precision to low-precision progressively. To evaluate the method, a 1-bit BinaryNet on CIFAR10 dataset and 8-bits, 4-bits MobileNet v1, ResNet_50 v1/2 on ImageNet dataset are trained using the alpha-blending approach, and the evaluation indicates that AB improves top-1 accuracy by 0.9%, 0.82% and 2.93% respectively compared to the results of STE based quantization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes