CVFeb 15, 2021

FAT: Learning Low-Bitwidth Parametric Representation via Frequency-Aware Transformation

Chaofan Tao, Rui Lin, Quan Chen, Zhaoyang Zhang, Ping Luo, Ngai Wong

arXiv:2102.07444v26.58 citationsHas Code

Originality Highly original

AI Analysis

This work addresses the problem of efficient model deployment for deep learning practitioners by reducing computations by 54.9X and 45.7X compared to full-precision models, though it is incremental as it builds on existing quantization methods.

The paper tackles the challenge of performance drop in low-bitwidth convolutional neural networks by introducing Frequency-Aware Transformation (FAT), a quantization pipeline that transforms weights in the frequency domain before quantization, achieving 70.5% and 69.2% top-1 accuracy on ImageNet for ResNet-18 and MobileNet-V2 in 4 bits.

Learning convolutional neural networks (CNNs) with low bitwidth is challenging because performance may drop significantly after quantization. Prior arts often discretize the network weights by carefully tuning hyper-parameters of quantization (e.g. non-uniform stepsize and layer-wise bitwidths), which are complicated and sub-optimal because the full-precision and low-precision models have a large discrepancy. This work presents a novel quantization pipeline, Frequency-Aware Transformation (FAT), which has several appealing benefits. (1) Rather than designing complicated quantizers like existing works, FAT learns to transform network weights in the frequency domain before quantization, making them more amenable to training in low bitwidth. (2) With FAT, CNNs can be easily trained in low precision using simple standard quantizers without tedious hyper-parameter tuning. Theoretical analysis shows that FAT improves both uniform and non-uniform quantizers. (3) FAT can be easily plugged into many CNN architectures. When training ResNet-18 and MobileNet-V2 in 4 bits, FAT plus a simple rounding operation already achieves 70.5% and 69.2% top-1 accuracy on ImageNet without bells and whistles, outperforming recent state-of-the-art by reducing 54.9X and 45.7X computations against full-precision models. We hope FAT provides a novel perspective for model quantization. Code is available at \url{https://github.com/ChaofanTao/FAT_Quantization}.

View on arXiv PDF Code

Similar