CVAISep 3, 2024

Adaptive Explicit Knowledge Transfer for Knowledge Distillation

arXiv:2409.01679v22 citationsh-index: 2
AI Analysis

This work addresses a specific bottleneck in knowledge distillation for classification tasks, offering an incremental improvement over existing methods.

The paper tackled the problem of inferior performance in logit-based knowledge distillation by proposing an adaptive explicit knowledge transfer method that learns both explicit and implicit knowledge, achieving improved performance on CIFAR-100 and ImageNet datasets.

Logit-based knowledge distillation (KD) for classification is cost-efficient compared to feature-based KD but often subject to inferior performance. Recently, it was shown that the performance of logit-based KD can be improved by effectively delivering the probability distribution for the non-target classes from the teacher model, which is known as `implicit (dark) knowledge', to the student model. Through gradient analysis, we first show that this actually has an effect of adaptively controlling the learning of implicit knowledge. Then, we propose a new loss that enables the student to learn explicit knowledge (i.e., the teacher's confidence about the target class) along with implicit knowledge in an adaptive manner. Furthermore, we propose to separate the classification and distillation tasks for effective distillation and inter-class relationship modeling. Experimental results demonstrate that the proposed method, called adaptive explicit knowledge transfer (AEKT) method, achieves improved performance compared to the state-of-the-art KD methods on the CIFAR-100 and ImageNet datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes