LGJun 13, 2022

Optimal Clipping and Magnitude-aware Differentiation for Improved Quantization-aware Training

Charbel Sakr, Steve Dai, Rangharajan Venkatesan, Brian Zimmer, William J. Dally, Brucek Khailany

arXiv:2206.06501v119.254 citationsh-index: 90

Originality Highly original

AI Analysis

This work addresses the challenge of maintaining accuracy in low-precision neural network training, which is critical for efficient deployment on resource-constrained devices, though it is incremental as it builds on existing QAT frameworks.

The paper tackles the problem of suboptimal clipping thresholds in quantization-aware training (QAT) by proposing OCTAV, a recursive algorithm that determines MSE-optimal clipping scalars on the fly, and introduces magnitude-aware differentiation to improve gradient estimation. The method achieves state-of-the-art accuracy, preserving performance at low precision (4-to-6 bits) on tasks like ImageNet training with ResNets and MobileNets and BERT fine-tuning on Squad.

Data clipping is crucial in reducing noise in quantization operations and improving the achievable accuracy of quantization-aware training (QAT). Current practices rely on heuristics to set clipping threshold scalars and cannot be shown to be optimal. We propose Optimally Clipped Tensors And Vectors (OCTAV), a recursive algorithm to determine MSE-optimal clipping scalars. Derived from the fast Newton-Raphson method, OCTAV finds optimal clipping scalars on the fly, for every tensor, at every iteration of the QAT routine. Thus, the QAT algorithm is formulated with provably minimum quantization noise at each step. In addition, we reveal limitations in common gradient estimation techniques in QAT and propose magnitude-aware differentiation as a remedy to further improve accuracy. Experimentally, OCTAV-enabled QAT achieves state-of-the-art accuracy on multiple tasks. These include training-from-scratch and retraining ResNets and MobileNets on ImageNet, and Squad fine-tuning using BERT models, where OCTAV-enabled QAT consistently preserves accuracy at low precision (4-to-6-bits). Our results require no modifications to the baseline training recipe, except for the insertion of quantization operations where appropriate.

View on arXiv PDF

Similar