Spirit Distillation: A Model Compression Method with Multi-domain Knowledge Transfer
It addresses the need for efficient and robust models in scenarios with limited data and computational resources, representing an incremental advancement in knowledge distillation methods.
The paper tackles the problem of model compression with cross-domain knowledge transfer by proposing Spirit Distillation, which transfers general knowledge from a teacher to a student network, achieving improvements of 1.4% in mIOU and 8.2% in high-precision accuracy while reducing FLOPs to 41.8%.
Recent applications pose requirements of both cross-domain knowledge transfer and model compression to machine learning models due to insufficient training data and limited computational resources. In this paper, we propose a new knowledge distillation model, named Spirit Distillation (SD), which is a model compression method with multi-domain knowledge transfer. The compact student network mimics out a representation equivalent to the front part of the teacher network, through which the general knowledge can be transferred from the source domain (teacher) to the target domain (student). To further improve the robustness of the student, we extend SD to Enhanced Spirit Distillation (ESD) in exploiting a more comprehensive knowledge by introducing the proximity domain which is similar to the target domain for feature extraction. Results demonstrate that our method can boost mIOU and high-precision accuracy by 1.4% and 8.2% respectively with 78.2% segmentation variance, and can gain a precise compact network with only 41.8% FLOPs.