Iteratively Training Look-Up Tables for Network Quantization
This work addresses the need for efficient neural network deployment on devices with limited resources, presenting a flexible quantization method that can be adapted for pruning or multiplier-less operations, though it is incremental as it builds on existing quantization techniques.
The paper tackles the problem of reducing memory and computational requirements of deep neural networks for resource-limited devices by introducing LUT-Q, a training method that learns a dictionary for weight quantization, and it achieves better performance than other methods with the same bitwidth in image recognition and object detection tasks.
Operating deep neural networks on devices with limited resources requires the reduction of their memory footprints and computational requirements. In this paper we introduce a training method, called look-up table quantization, LUT-Q, which learns a dictionary and assigns each weight to one of the dictionary's values. We show that this method is very flexible and that many other techniques can be seen as special cases of LUT-Q. For example, we can constrain the dictionary trained with LUT-Q to generate networks with pruned weight matrices or restrict the dictionary to powers-of-two to avoid the need for multiplications. In order to obtain fully multiplier-less networks, we also introduce a multiplier-less version of batch normalization. Extensive experiments on image recognition and object detection tasks show that LUT-Q consistently achieves better performance than other methods with the same quantization bitwidth.