CVLGNov 30, 2022

Hint-dynamic Knowledge Distillation

arXiv:2211.17059v11 citationsh-index: 54
Originality Incremental advance
AI Analysis

This work addresses the challenge of optimizing knowledge transfer in distillation for machine learning practitioners, but it is incremental as it builds on existing methods by focusing on dynamic hint utilization.

The paper tackles the problem of inefficient knowledge utilization in knowledge distillation by proposing Hint-dynamic Knowledge Distillation (HKD), which adaptively customizes hint-learning for each instance using a meta-weight network and weight ensembling, resulting in improved performance on CIFAR-100 and Tiny-ImageNet benchmarks.

Knowledge Distillation (KD) transfers the knowledge from a high-capacity teacher model to promote a smaller student model. Existing efforts guide the distillation by matching their prediction logits, feature embedding, etc., while leaving how to efficiently utilize them in junction less explored. In this paper, we propose Hint-dynamic Knowledge Distillation, dubbed HKD, which excavates the knowledge from the teacher' s hints in a dynamic scheme. The guidance effect from the knowledge hints usually varies in different instances and learning stages, which motivates us to customize a specific hint-learning manner for each instance adaptively. Specifically, a meta-weight network is introduced to generate the instance-wise weight coefficients about knowledge hints in the perception of the dynamical learning progress of the student model. We further present a weight ensembling strategy to eliminate the potential bias of coefficient estimation by exploiting the historical statics. Experiments on standard benchmarks of CIFAR-100 and Tiny-ImageNet manifest that the proposed HKD well boost the effect of knowledge distillation tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes