Knowledge Distillation with Refined Logits
This work addresses a specific limitation in model compression for machine learning practitioners, representing an incremental improvement over existing logit distillation methods.
The paper tackles the problem of misleading teacher predictions in knowledge distillation by introducing Refined Logit Distillation (RLD), which dynamically refines teacher logits using labeling information to preserve class correlations, resulting in superior performance on CIFAR-100 and ImageNet benchmarks.
Recent research on knowledge distillation has increasingly focused on logit distillation because of its simplicity, effectiveness, and versatility in model compression. In this paper, we introduce Refined Logit Distillation (RLD) to address the limitations of current logit distillation methods. Our approach is motivated by the observation that even high-performing teacher models can make incorrect predictions, creating an exacerbated divergence between the standard distillation loss and the cross-entropy loss, which can undermine the consistency of the student model's learning objectives. Previous attempts to use labels to empirically correct teacher predictions may undermine the class correlations. In contrast, our RLD employs labeling information to dynamically refine teacher logits. In this way, our method can effectively eliminate misleading information from the teacher while preserving crucial class correlations, thus enhancing the value and efficiency of distilled knowledge. Experimental results on CIFAR-100 and ImageNet demonstrate its superiority over existing methods. Our code is available at https://github.com/zju-SWJ/RLD.