MEC-Quant: Maximum Entropy Coding for Extremely Low Bit Quantization-Aware Training
This work addresses the challenge of efficient neural network deployment for resource-constrained applications, representing a strong specific gain rather than an incremental improvement.
The paper tackles the problem of performance degradation in Quantization-Aware Training (QAT) under extremely low-bit settings by proposing MEC-Quant, which optimizes representation structure to reduce bias, resulting in accuracy comparable to or surpassing full precision models and pushing QAT limits to x-bit activation for the first time.
Quantization-Aware Training (QAT) has driven much attention to produce efficient neural networks. Current QAT still obtains inferior performances compared with the Full Precision (FP) counterpart. In this work, we argue that quantization inevitably introduce biases into the learned representation, especially under the extremely low-bit setting. To cope with this issue, we propose Maximum Entropy Coding Quantization (MEC-Quant), a more principled objective that explicitly optimizes on the structure of the representation, so that the learned representation is less biased and thus generalizes better to unseen in-distribution samples. To make the objective end-to-end trainable, we propose to leverage the minimal coding length in lossy data coding as a computationally tractable surrogate for the entropy, and further derive a scalable reformulation of the objective based on Mixture Of Experts (MOE) that not only allows fast computation but also handles the long-tailed distribution for weights or activation values. Extensive experiments on various tasks on computer vision tasks prove its superiority. With MEC-Qaunt, the limit of QAT is pushed to the x-bit activation for the first time and the accuracy of MEC-Quant is comparable to or even surpass the FP counterpart. Without bells and whistles, MEC-Qaunt establishes a new state of the art for QAT.