LGCVMLAug 13, 2020

Weight Equalizing Shift Scaler-Coupled Post-training Quantization

arXiv:2008.05767v13 citations
Originality Incremental advance
AI Analysis

This addresses accuracy loss in quantization for efficient deployment on hardware like neural processing units, but it is incremental as it builds on existing layer-wise quantization methods.

The paper tackled the problem of severe accuracy degradation in post-training quantization for neural networks with large per-channel weight range differences, such as MobileNets, by proposing a weight equalizing shift scaler that uses 4-bit binary shifting to rescale weights before quantization, achieving a top-1 accuracy of 69.78% to 70.96% on ImageNet.

Post-training, layer-wise quantization is preferable because it is free from retraining and is hardware-friendly. Nevertheless, accuracy degradation has occurred when a neural network model has a big difference of per-out-channel weight ranges. In particular, the MobileNet family has a tragedy drop in top-1 accuracy from 70.60% ~ 71.87% to 0.1% on the ImageNet dataset after 8-bit weight quantization. To mitigate this significant accuracy reduction, we propose a new weight equalizing shift scaler, i.e. rescaling the weight range per channel by a 4-bit binary shift, prior to a layer-wise quantization. To recover the original output range, inverse binary shifting is efficiently fused to the existing per-layer scale compounding in the fixed-computing convolutional operator of the custom neural processing unit. The binary shift is a key feature of our algorithm, which significantly improved the accuracy performance without impeding the memory footprint. As a result, our proposed method achieved a top-1 accuracy of 69.78% ~ 70.96% in MobileNets and showed robust performance in varying network models and tasks, which is competitive to channel-wise quantization results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes