CVNov 18, 2019

Preparing Lessons: Improve Knowledge Distillation with Better Supervision

arXiv:1911.07471v384 citations
Originality Incremental advance
AI Analysis

This work addresses the issue of uncertain supervision in knowledge distillation for machine learning practitioners, offering incremental improvements over existing methods.

The paper tackles the problem of imperfect teacher supervision in knowledge distillation by introducing Knowledge Adjustment and Dynamic Temperature Distillation to penalize bad supervision, resulting in improved student model performance on datasets like CIFAR-100, CINIC-10, and Tiny ImageNet, with further gains when combined with other KD methods.

Knowledge distillation (KD) is widely used for training a compact model with the supervision of another large model, which could effectively improve the performance. Previous methods mainly focus on two aspects: 1) training the student to mimic representation space of the teacher; 2) training the model progressively or adding extra module like discriminator. Knowledge from teacher is useful, but it is still not exactly right compared with ground truth. Besides, overly uncertain supervision also influences the result. We introduce two novel approaches, Knowledge Adjustment (KA) and Dynamic Temperature Distillation (DTD), to penalize bad supervision and improve student model. Experiments on CIFAR-100, CINIC-10 and Tiny ImageNet show that our methods get encouraging performance compared with state-of-the-art methods. When combined with other KD-based methods, the performance will be further improved.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes