CV MMMar 11, 2024

QuantTune: Optimizing Model Quantization with Adaptive Outlier-Driven Fine Tuning

Jiun-Man Chen, Yu-Hsuan Chao, Yu-Jie Wang, Ming-Der Shieh, Chih-Chung Hsu, Wei-Fen Lin

arXiv:2403.06497v15.23 citationsh-index: 1MIPR

Originality Incremental advance

AI Analysis

This addresses the problem of maintaining inference accuracy in quantized models for CV and NLP applications, offering an incremental improvement over existing calibration techniques.

The paper tackled accuracy drops in post-training linear quantization of Transformer-based models by identifying that 65% of quantization errors stem from outliers, and proposed QuantTune, a fine-tuning method that adjusts weights to constrain outlier activations, reducing accuracy drops by up to 33.8% compared to top methods.

Transformer-based models have gained widespread popularity in both the computer vision (CV) and natural language processing (NLP) fields. However, significant challenges arise during post-training linear quantization, leading to noticeable reductions in inference accuracy. Our study focuses on uncovering the underlying causes of these accuracy drops and proposing a quantization-friendly fine-tuning method, \textbf{QuantTune}. Firstly, our analysis revealed that, on average, 65\% of quantization errors result from the precision loss incurred by the dynamic range amplification effect of outliers across the target Transformer-based models. Secondly, \textbf{QuantTune} adjusts weights based on the deviation of outlier activations and effectively constrains the dynamic ranges of the problematic activations. As a result, it successfully mitigates the negative impact of outliers on the inference accuracy of quantized models. Lastly, \textbf{QuantTune} can be seamlessly integrated into the back-propagation pass in the fine-tuning process without requiring extra complexity in inference software and hardware design. Our approach showcases significant improvements in post-training quantization across a range of Transformer-based models, including ViT, Bert-base, and OPT. QuantTune reduces accuracy drops by 12.09\% at 8-bit quantization and 33.8\% at 7-bit compared to top calibration methods, outperforming state-of-the-art solutions by over 18.84\% across ViT models.

View on arXiv PDF

Similar