CVApr 26, 2022

RAPQ: Rescuing Accuracy for Power-of-Two Low-bit Post-training Quantization

arXiv:2204.12322v221 citationsh-index: 14
Originality Incremental advance
AI Analysis

This work addresses the need for efficient quantization methods that meet hardware constraints without retraining, offering a novel approach for deploying models on resource-limited devices, though it is incremental as it builds on existing PTQ techniques.

The paper tackles the problem of Power-of-Two low-bit post-training quantization for deep neural networks, which is hardware-friendly but prone to errors due to limited scale factor candidates, and proposes RAPQ, a method that dynamically adjusts scales across the network to balance errors, achieving 65% and 48% accuracy on ResNet-18 and MobileNetV2 with weight INT2 and activation INT4 on ImageNet.

We introduce a Power-of-Two low-bit post-training quantization(PTQ) method for deep neural network that meets hardware requirements and does not call for long-time retraining. Power-of-Two quantization can convert the multiplication introduced by quantization and dequantization to bit-shift that is adopted by many efficient accelerators. However, the Power-of-Two scale factors have fewer candidate values, which leads to more rounding or clipping errors. We propose a novel Power-of-Two PTQ framework, dubbed RAPQ, which dynamically adjusts the Power-of-Two scales of the whole network instead of statically determining them layer by layer. It can theoretically trade off the rounding error and clipping error of the whole network. Meanwhile, the reconstruction method in RAPQ is based on the BN information of every unit. Extensive experiments on ImageNet prove the excellent performance of our proposed method. Without bells and whistles, RAPQ can reach accuracy of 65% and 48% on ResNet-18 and MobileNetV2 respectively with weight INT2 activation INT4. We are the first to propose the more constrained but hardware-friendly Power-of-Two quantization scheme for low-bit PTQ specially and prove that it can achieve nearly the same accuracy as SOTA PTQ method. The code was released.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes