LGMLOct 1, 2018

ProxQuant: Quantized Neural Networks via Proximal Operators

arXiv:1810.00861v3131 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the need for efficient deep learning on mobile devices by providing a more stable and principled alternative to existing quantization methods, though it is incremental as it builds upon known optimization techniques.

The paper tackles the problem of training quantized neural networks for resource-constrained environments by proposing ProxQuant, a method that formulates quantization as a regularized learning problem optimized via prox-gradient. It outperforms state-of-the-art on binary quantization for ResNets and LSTMs, with theoretical and experimental evidence showing greater stability than the straight-through gradient method.

To make deep neural networks feasible in resource-constrained environments (such as mobile devices), it is beneficial to quantize models by using low-precision weights. One common technique for quantizing neural networks is the straight-through gradient method, which enables back-propagation through the quantization mapping. Despite its empirical success, little is understood about why the straight-through gradient method works. Building upon a novel observation that the straight-through gradient method is in fact identical to the well-known Nesterov's dual-averaging algorithm on a quantization constrained optimization problem, we propose a more principled alternative approach, called ProxQuant, that formulates quantized network training as a regularized learning problem instead and optimizes it via the prox-gradient method. ProxQuant does back-propagation on the underlying full-precision vector and applies an efficient prox-operator in between stochastic gradient steps to encourage quantizedness. For quantizing ResNets and LSTMs, ProxQuant outperforms state-of-the-art results on binary quantization and is on par with state-of-the-art on multi-bit quantization. For binary quantization, our analysis shows both theoretically and experimentally that ProxQuant is more stable than the straight-through gradient method (i.e. BinaryConnect), challenging the indispensability of the straight-through gradient method and providing a powerful alternative.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes