LGOct 4, 2023

Improving Knowledge Distillation with Teacher's Explanation

arXiv:2310.02572v11 citationsh-index: 19
Originality Incremental advance
AI Analysis

This work addresses the problem of limited knowledge transfer in model compression for machine learning practitioners, offering an incremental improvement by extending distillation with explanations.

The paper tackles the limitation of knowledge distillation by introducing a framework where the student learns from both the teacher's predictions and explanations, resulting in students that substantially outperform traditional knowledge distillation students of similar complexity across various datasets.

Knowledge distillation (KD) improves the performance of a low-complexity student model with the help of a more powerful teacher. The teacher in KD is a black-box model, imparting knowledge to the student only through its predictions. This limits the amount of transferred knowledge. In this work, we introduce a novel Knowledge Explaining Distillation (KED) framework, which allows the student to learn not only from the teacher's predictions but also from the teacher's explanations. We propose a class of superfeature-explaining teachers that provide explanation over groups of features, along with the corresponding student model. We also present a method for constructing the superfeatures. We then extend KED to reduce complexity in convolutional neural networks, to allow augmentation with hidden-representation distillation methods, and to work with a limited amount of training data using chimeric sets. Our experiments over a variety of datasets show that KED students can substantially outperform KD students of similar complexity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes