CVLGFeb 25, 2022

Learn From the Past: Experience Ensemble Knowledge Distillation

arXiv:2202.12488v18 citations
Originality Incremental advance
AI Analysis

This work addresses knowledge distillation for model compression, offering a novel approach that improves performance while saving training costs, though it is incremental in the context of existing distillation methods.

The paper tackles the problem of knowledge distillation by incorporating intermediate models from the teacher's training process, called teacher's experience, into an ensemble method, achieving state-of-the-art results on CIFAR-100 and ImageNet datasets.

Traditional knowledge distillation transfers "dark knowledge" of a pre-trained teacher network to a student network, and ignores the knowledge in the training process of the teacher, which we call teacher's experience. However, in realistic educational scenarios, learning experience is often more important than learning results. In this work, we propose a novel knowledge distillation method by integrating the teacher's experience for knowledge transfer, named experience ensemble knowledge distillation (EEKD). We save a moderate number of intermediate models from the training process of the teacher model uniformly, and then integrate the knowledge of these intermediate models by ensemble technique. A self-attention module is used to adaptively assign weights to different intermediate models in the process of knowledge transfer. Three principles of constructing EEKD on the quality, weights and number of intermediate models are explored. A surprising conclusion is found that strong ensemble teachers do not necessarily produce strong students. The experimental results on CIFAR-100 and ImageNet show that EEKD outperforms the mainstream knowledge distillation methods and achieves the state-of-the-art. In particular, EEKD even surpasses the standard ensemble distillation on the premise of saving training cost.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes