LGMLFeb 10, 2020

Subclass Distillation

arXiv:2002.03936v235 citations
AI Analysis

This work addresses the challenge of efficient model compression for practitioners, but it is incremental as it builds on existing distillation techniques by adding subclass invention.

The paper tackles the problem of knowledge distillation from large teacher models to smaller student models, particularly when there are few classes, by introducing subclass distillation where the teacher invents subclasses during training to improve transfer; results show that this method allows students to learn faster and better, especially on datasets with unknown subclasses.

After a large "teacher" neural network has been trained on labeled data, the probabilities that the teacher assigns to incorrect classes reveal a lot of information about the way in which the teacher generalizes. By training a small "student" model to match these probabilities, it is possible to transfer most of the generalization ability of the teacher to the student, often producing a much better small model than directly training the student on the training data. The transfer works best when there are many possible classes because more is then revealed about the function learned by the teacher, but in cases where there are only a few possible classes we show that we can improve the transfer by forcing the teacher to divide each class into many subclasses that it invents during the supervised training. The student is then trained to match the subclass probabilities. For datasets where there are known, natural subclasses we demonstrate that the teacher learns similar subclasses and these improve distillation. For clickthrough datasets where the subclasses are unknown we demonstrate that subclass distillation allows the student to learn faster and better.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes