LGDec 10, 2020

Recurrence of Optimum for Training Weight and Activation Quantized Networks

arXiv:2012.05529v26 citations
AI Analysis

This work provides a theoretical foundation for the optimization of fully quantized neural networks, which is important for researchers and practitioners developing efficient AI models for resource-constrained platforms.

This paper addresses the challenging optimization problem of training deep neural networks with low-precision weights and activations. The authors prove that a projected gradient-like algorithm, using a 'fake' gradient, causes the sequence of quantized weights to recurrently visit the global optimum of the discrete minimization problem for training fully quantized networks.

Deep neural networks (DNNs) are quantized for efficient inference on resource-constrained platforms. However, training deep learning models with low-precision weights and activations involves a demanding optimization task, which calls for minimizing a stage-wise loss function subject to a discrete set-constraint. While numerous training methods have been proposed, existing studies for full quantization of DNNs are mostly empirical. From a theoretical point of view, we study practical techniques for overcoming the combinatorial nature of network quantization. Specifically, we investigate a simple yet powerful projected gradient-like algorithm for quantizing two-linear-layer networks, which proceeds by repeatedly moving one step at float weights in the negation of a heuristic \emph{fake} gradient of the loss function (so-called coarse gradient) evaluated at quantized weights. For the first time, we prove that under mild conditions, the sequence of quantized weights recurrently visits the global optimum of the discrete minimization problem for training fully quantized network. We also show numerical evidence of the recurrence phenomenon of weight evolution in training quantized deep networks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes