LGMLJun 30, 2020

Extracurricular Learning: Knowledge Transfer Beyond Empirical Distribution

arXiv:2007.00051v27 citations
Originality Incremental advance
AI Analysis

This addresses the problem of model compression accuracy loss for practitioners using knowledge distillation, though it is an incremental improvement over existing methods.

The paper tackles the accuracy gap in knowledge distillation between teacher and student models by proposing extracurricular learning, which samples from an approximation of the underlying data distribution to match output distributions, resulting in reductions of the gap by 46% to 68% and accuracy improvements such as 16% regression error reduction on MPIIGaze and up to +9.1% top-1 accuracy gain on CIFAR100.

Knowledge distillation has been used to transfer knowledge learned by a sophisticated model (teacher) to a simpler model (student). This technique is widely used to compress model complexity. However, in most applications the compressed student model suffers from an accuracy gap with its teacher. We propose extracurricular learning, a novel knowledge distillation method, that bridges this gap by (1) modeling student and teacher output distributions; (2) sampling examples from an approximation to the underlying data distribution; and (3) matching student and teacher output distributions over this extended set including uncertain samples. We conduct rigorous evaluations on regression and classification tasks and show that compared to the standard knowledge distillation, extracurricular learning reduces the gap by 46% to 68%. This leads to major accuracy improvements compared to the empirical risk minimization-based training for various recent neural network architectures: 16% regression error reduction on the MPIIGaze dataset, +3.4% to +9.1% improvement in top-1 classification accuracy on the CIFAR100 dataset, and +2.9% top-1 improvement on the ImageNet dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes