Teacher Network Calibration Improves Cross-Quality Knowledge Distillation
This addresses the problem of computational efficiency in computer vision applications for practitioners by enabling lower-resolution inference with improved accuracy.
The paper tackles cross-quality knowledge distillation (CQKD) by transferring knowledge from a teacher network using full-resolution images to a student network using low-resolution images, showing that it outperforms supervised learning in large-scale image classification and reduces computational load at inference.
We investigate cross-quality knowledge distillation (CQKD), a knowledge distillation method where knowledge from a teacher network trained with full-resolution images is transferred to a student network that takes as input low-resolution images. As image size is a deciding factor for the computational load of computer vision applications, CQKD notably reduces the requirements by only using the student network at inference time. Our experimental results show that CQKD outperforms supervised learning in large-scale image classification problems. We also highlight the importance of calibrating neural networks: we show that with higher temperature smoothing of the teacher's output distribution, the student distribution exhibits a higher entropy, which leads to both, a lower calibration error and a higher network accuracy.