CVIVJul 4, 2020

FracBits: Mixed Precision Quantization via Fractional Bit-Widths

arXiv:2007.02017v20.0095 citations
AI Analysis50

This work addresses efficiency improvements for deploying neural networks on hardware with mixed precision support, but it is incremental as it builds on existing quantization techniques.

The authors tackled the problem of optimizing mixed precision quantization for deep neural networks to meet specific computation and model size constraints, achieving comparable or better performance than previous methods on models like MobilenetV1/V2 and ResNet18 on ImageNet.

Model quantization helps to reduce model size and latency of deep neural networks. Mixed precision quantization is favorable with customized hardwares supporting arithmetic operations at multiple bit-widths to achieve maximum efficiency. We propose a novel learning-based algorithm to derive mixed precision models end-to-end under target computation constraints and model sizes. During the optimization, the bit-width of each layer / kernel in the model is at a fractional status of two consecutive bit-widths which can be adjusted gradually. With a differentiable regularization term, the resource constraints can be met during the quantization-aware training which results in an optimized mixed precision model. Further, our method can be naturally combined with channel pruning for better computation cost allocation. Our final models achieve comparable or better performance than previous quantization methods with mixed precision on MobilenetV1/V2, ResNet18 under different resource constraints on ImageNet dataset.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes