CVLGOct 26, 2020

Activation Map Adaptation for Effective Knowledge Distillation

arXiv:2010.13500v2
Originality Synthesis-oriented
AI Analysis

This work addresses efficiency and accuracy trade-offs in deploying neural networks on mobile devices, but it is incremental as it builds on existing knowledge distillation methods.

The paper tackles model compression for deploying neural networks on embedded devices by proposing a knowledge distillation strategy that uses an activation map adaptive module to improve student network training. Results on CIFAR-10 show a 0.6% accuracy boost and 6.5% loss reduction for the student network.

Model compression becomes a recent trend due to the requirement of deploying neural networks on embedded and mobile devices. Hence, both accuracy and efficiency are of critical importance. To explore a balance between them, a knowledge distillation strategy is proposed for general visual representation learning. It utilizes our well-designed activation map adaptive module to replace some blocks of the teacher network, exploring the most appropriate supervisory features adaptively during the training process. Using the teacher's hidden layer output to prompt the student network to train so as to transfer effective semantic information.To verify the effectiveness of our strategy, this paper applied our method to cifar-10 dataset. Results demonstrate that the method can boost the accuracy of the student network by 0.6% with 6.5% loss reduction, and significantly improve its training speed.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes