LGJun 19, 2021

Teacher's pet: understanding and mitigating biases in distillation

arXiv:2106.10494v230 citations
Originality Incremental advance
AI Analysis

This addresses biases in knowledge distillation for machine learning practitioners, offering a mitigation strategy for subgroup performance issues, though it is incremental as it builds on existing distillation methods.

The paper tackled the problem of knowledge distillation causing performance harm on certain data subgroups, such as classes with few samples, by showing that teacher errors are transferred and amplified in the student model. It presented techniques to soften teacher influence for less reliable subgroups, maintaining overall accuracy gains while improving subgroup performance on image classification benchmarks.

Knowledge distillation is widely used as a means of improving the performance of a relatively simple student model using the predictions from a complex teacher model. Several works have shown that distillation significantly boosts the student's overall performance; however, are these gains uniform across all data subgroups? In this paper, we show that distillation can harm performance on certain subgroups, e.g., classes with few associated samples. We trace this behaviour to errors made by the teacher distribution being transferred to and amplified by the student model. To mitigate this problem, we present techniques which soften the teacher influence for subgroups where it is less reliable. Experiments on several image classification benchmarks show that these modifications of distillation maintain boost in overall accuracy, while additionally ensuring improvement in subgroup performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes