Additive Noise Annealing and Approximation Properties of Quantized Neural Networks
This addresses the challenge of efficient neural network deployment for resource-constrained devices by improving quantization methods, though it is incremental as it builds on existing techniques like the straight-through estimator.
The paper tackles the problem of training quantized neural networks by proposing a novel gradient-based algorithm that uses additive noise annealing, achieving state-of-the-art performance on CIFAR-10 and ImageNet benchmarks with ternary networks on AlexNet and MobileNetV2.
We present a theoretical and experimental investigation of the quantization problem for artificial neural networks. We provide a mathematical definition of quantized neural networks and analyze their approximation capabilities, showing in particular that any Lipschitz-continuous map defined on a hypercube can be uniformly approximated by a quantized neural network. We then focus on the regularization effect of additive noise on the arguments of multi-step functions inherent to the quantization of continuous variables. In particular, when the expectation operator is applied to a non-differentiable multi-step random function, and if the underlying probability density is differentiable (in either classical or weak sense), then a differentiable function is retrieved, with explicit bounds on its Lipschitz constant. Based on these results, we propose a novel gradient-based training algorithm for quantized neural networks that generalizes the straight-through estimator, acting on noise applied to the network's parameters. We evaluate our algorithm on the CIFAR-10 and ImageNet image classification benchmarks, showing state-of-the-art performance on AlexNet and MobileNetV2 for ternary networks.