Knowledge distillation for optimization of quantized deep neural networks
This work addresses the performance loss in quantized models for efficient deep learning deployment, but it is incremental as it builds on existing KD methods with hyper-parameter tuning.
The paper tackled the problem of optimizing quantized deep neural networks (QDNNs) using knowledge distillation (KD) by analyzing hyper-parameter effects and introducing a technique to reduce the coefficient during training, achieving test accuracies of 92.7% on CIFAR-10 and 67.0% on CIFAR-100 with Resnet20 using 2-bit ternary weights.
Knowledge distillation (KD) is a very popular method for model size reduction. Recently, the technique is exploited for quantized deep neural networks (QDNNs) training as a way to restore the performance sacrificed by word-length reduction. KD, however, employs additional hyper-parameters, such as temperature, coefficient, and the size of teacher network for QDNN training. We analyze the effect of these hyper-parameters for QDNN optimization with KD. We find that these hyper-parameters are inter-related, and also introduce a simple and effective technique that reduces \textit{coefficient} during training. With KD employing the proposed hyper-parameters, we achieve the test accuracy of 92.7% and 67.0% on Resnet20 with 2-bit ternary weights for CIFAR-10 and CIFAR-100 data sets, respectively.