Channel-wise Hessian Aware trace-Weighted Quantization of Neural Networks
This work improves neural network quantization for deployment on resource-constrained devices by enabling more granular bit allocation, though it is incremental as it builds on existing Hessian-based approaches.
The paper tackles the problem of efficiently quantizing neural networks by addressing channel-level redundancy, which prior methods ignored due to high complexity, and introduces CW-HAWQ, a method that uses Hessian traces and deep reinforcement learning to assign quantization bits per channel, achieving better results than state-of-the-art methods on multiple networks.
Second-order information has proven to be very effective in determining the redundancy of neural network weights and activations. Recent paper proposes to use Hessian traces of weights and activations for mixed-precision quantization and achieves state-of-the-art results. However, prior works only focus on selecting bits for each layer while the redundancy of different channels within a layer also differ a lot. This is mainly because the complexity of determining bits for each channel is too high for original methods. Here, we introduce Channel-wise Hessian Aware trace-Weighted Quantization (CW-HAWQ). CW-HAWQ uses Hessian trace to determine the relative sensitivity order of different channels of activations and weights. What's more, CW-HAWQ proposes to use deep Reinforcement learning (DRL) Deep Deterministic Policy Gradient (DDPG)-based agent to find the optimal ratios of different quantization bits and assign bits to channels according to the Hessian trace order. The number of states in CW-HAWQ is much smaller compared with traditional AutoML based mix-precision methods since we only need to search ratios for the quantization bits. Compare CW-HAWQ with state-of-the-art shows that we can achieve better results for multiple networks.