Improving Knowledge Distillation via Transferring Learning Ability
This work addresses a specific bottleneck in knowledge distillation for machine learning practitioners, offering an incremental improvement.
The paper tackles the capacity-gap problem in knowledge distillation by proposing SLKD, a method that transfers learning ability from teacher to student networks, resulting in improved performance over existing approaches.
Existing knowledge distillation methods generally use a teacher-student approach, where the student network solely learns from a well-trained teacher. However, this approach overlooks the inherent differences in learning abilities between the teacher and student networks, thus causing the capacity-gap problem. To address this limitation, we propose a novel method called SLKD.