DopQ-ViT: Towards Distribution-Friendly and Outlier-Aware Post-Training Quantization for Vision Transformers
This work addresses the problem of efficient deployment of Vision Transformers for practitioners by improving quantization performance, though it is incremental as it builds on existing PTQ methods.
The paper tackles performance degradation in low-bit post-training quantization for Vision Transformers by addressing misalignment with power-law distributions and outlier issues in scaling factors, resulting in DopQ-ViT, which outperforms previous methods on classification and detection tasks with concrete gains like up to 2.1% higher accuracy on ImageNet.
Vision Transformers (ViTs) have gained significant attention, but their high computing cost limits the practical applications. While post-training quantization (PTQ) reduces model size and speeds up inference, it often degrades performance, especially in low-bit settings. We identify two key reasons for the performance degradation: 1) existing quantization methods fail to align with the power-law distribution of post-Softmax activations, and 2) reparameterizing post-LayerNorm activations leads to a performance drop due to the significant influence of outliers in the scaling factors. To address these challenges, we propose DopQ-ViT, a Distribution-friendly and Outlier-aware Post-training Quantization method for ViTs. First, DopQ-ViT introduces the Tan Quantizer (TanQ), which better preserves the power-law distribution of post-Softmax activations by focusing more on values near 1. Second, DopQ-ViT presents the MAD-guided Optimal Scaling Factor (MOSF), which selects the optimal scaling factor without introducing additional calculations. Extensive experiments across various ViT models and quantization settings demonstrate that DopQ-ViT, with the help of TanQ and MOSF, outperforms previous PTQ methods on both classification and detection tasks.