Education distillation:getting student models to learn in shcools
This work addresses the challenge of efficient knowledge transfer in deep learning, offering a novel approach that could enhance model performance in various applications, though it appears incremental as it builds on existing distillation frameworks.
The paper tackles the problem of knowledge forgetting in knowledge distillation by proposing Education Distillation (ED), a method that mimics human educational stages to train student models step-by-step, resulting in significant improvements in accuracy and generalization across multiple datasets compared to conventional methods.
This paper introduces a new knowledge distillation method, called education distillation (ED), which is inspired by the structured and progressive nature of human learning. ED mimics the educational stages of primary school, middle school, and university and designs teaching reference blocks. The student model is split into a main body and multiple teaching reference blocks to learn from teachers step by step. This promotes efficient knowledge distillation while maintaining the architecture of the student model. Experimental results on the CIFAR100, Tiny Imagenet, Caltech and Food-101 datasets show that the teaching reference blocks can effectively avoid the problem of forgetting. Compared with conventional single-teacher and multi-teacher knowledge distillation methods, ED significantly improves the accuracy and generalization ability of the student model. These findings highlight the potential of ED to improve model performance across different architectures and datasets, indicating its value in various deep learning scenarios. Code examples can be obtained at: https://github.com/Revolutioner1/ED.git.