Oscillations Make Neural Networks Robust to Quantization
This work addresses the challenge of maintaining model accuracy under quantization for neural networks, particularly in finding weights robust to bit changes, which is an incremental advance over current methods that struggle to match QAT accuracy at specific bits.
The paper tackles the problem of model performance degradation during quantization by proposing a novel regularization method that induces weight oscillations to improve robustness, achieving results that match QAT accuracy at >= 3-bit weight quantization and maintain close to full precision accuracy at higher bits.
We challenge the prevailing view that oscillations in Quantization Aware Training (QAT) are merely undesirable artifacts caused by the Straight-Through Estimator (STE). Through theoretical analysis of QAT in linear models, we demonstrate that the gradient of the loss function can be decomposed into two terms: the original full-precision loss and a term that causes quantization oscillations. Based on these insights, we propose a novel regularization method that induces oscillations to improve quantization robustness. Contrary to traditional methods that focuses on minimizing the effects of oscillations, our approach leverages the beneficial aspects of weight oscillations to preserve model performance under quantization. Our empirical results on ResNet-18 and Tiny ViT demonstrate that this counter-intuitive strategy matches QAT accuracy at >= 3-bit weight quantization, while maintaining close to full precision accuracy at bits greater than the target bit. Our work therefore provides a new perspective on model preparation for quantization, particularly for finding weights that are robust to changes in the bit of the quantizer -- an area where current methods struggle to match the accuracy of QAT at specific bits.