CVDec 21, 2024

Cross-View Consistency Regularisation for Knowledge Distillation

Weijia Zhang, Dongnan Liu, Weidong Cai, Chao Ma

arXiv:2412.16493v18.713 citationsh-index: 17Has CodeMM

Originality Incremental advance

AI Analysis

This work addresses a specific bottleneck in knowledge distillation for machine learning practitioners, offering an incremental improvement over existing methods.

The paper tackles the issues of overconfident teacher and confirmation bias in logit-based knowledge distillation by introducing cross-view consistency regularization and confidence-based soft label mining, achieving new state-of-the-art results on CIFAR-100, Tiny-ImageNet, and ImageNet datasets across diverse architectures.

Knowledge distillation (KD) is an established paradigm for transferring privileged knowledge from a cumbersome model to a lightweight and efficient one. In recent years, logit-based KD methods are quickly catching up in performance with their feature-based counterparts. However, previous research has pointed out that logit-based methods are still fundamentally limited by two major issues in their training process, namely overconfident teacher and confirmation bias. Inspired by the success of cross-view learning in fields such as semi-supervised learning, in this work we introduce within-view and cross-view regularisations to standard logit-based distillation frameworks to combat the above cruxes. We also perform confidence-based soft label mining to improve the quality of distilling signals from the teacher, which further mitigates the confirmation bias problem. Despite its apparent simplicity, the proposed Consistency-Regularisation-based Logit Distillation (CRLD) significantly boosts student learning, setting new state-of-the-art results on the standard CIFAR-100, Tiny-ImageNet, and ImageNet datasets across a diversity of teacher and student architectures, whilst introducing no extra network parameters. Orthogonal to on-going logit-based distillation research, our method enjoys excellent generalisation properties and, without bells and whistles, boosts the performance of various existing approaches by considerable margins.

View on arXiv PDF Code

Similar