LGAICVJul 10, 2023

QBitOpt: Fast and Accurate Bitwidth Reallocation during Training

arXiv:2307.04535v17 citationsh-index: 26
Originality Incremental advance
AI Analysis

This work addresses the problem of efficient neural network inference on mobile and embedded devices by providing a fast and accurate method for bitwidth reallocation during training, though it is incremental as it builds on existing mixed-precision quantization techniques.

The paper tackles the challenge of finding optimal bitwidth allocations for mixed-precision quantized neural networks, which is computationally expensive due to exponential search spaces. It proposes QBitOpt, an algorithm that formulates this as a constraint optimization problem, combining fast sensitivities and efficient solvers during training to guarantee resource constraints and outperform existing methods on ImageNet under common bitwidth constraints.

Quantizing neural networks is one of the most effective methods for achieving efficient inference on mobile and embedded devices. In particular, mixed precision quantized (MPQ) networks, whose layers can be quantized to different bitwidths, achieve better task performance for the same resource constraint compared to networks with homogeneous bitwidths. However, finding the optimal bitwidth allocation is a challenging problem as the search space grows exponentially with the number of layers in the network. In this paper, we propose QBitOpt, a novel algorithm for updating bitwidths during quantization-aware training (QAT). We formulate the bitwidth allocation problem as a constraint optimization problem. By combining fast-to-compute sensitivities with efficient solvers during QAT, QBitOpt can produce mixed-precision networks with high task performance guaranteed to satisfy strict resource constraints. This contrasts with existing mixed-precision methods that learn bitwidths using gradients and cannot provide such guarantees. We evaluate QBitOpt on ImageNet and confirm that we outperform existing fixed and mixed-precision methods under average bitwidth constraints commonly found in the literature.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes