CVDec 21, 2024

Cross-View Consistency Regularisation for Knowledge Distillation

arXiv:2412.16493v113 citationsh-index: 17MM
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in knowledge distillation for machine learning practitioners, offering an incremental improvement over existing methods.

The paper tackles the issues of overconfident teacher and confirmation bias in logit-based knowledge distillation by introducing cross-view consistency regularization and confidence-based soft label mining, achieving new state-of-the-art results on CIFAR-100, Tiny-ImageNet, and ImageNet datasets across diverse architectures.

Knowledge distillation (KD) is an established paradigm for transferring privileged knowledge from a cumbersome model to a lightweight and efficient one. In recent years, logit-based KD methods are quickly catching up in performance with their feature-based counterparts. However, previous research has pointed out that logit-based methods are still fundamentally limited by two major issues in their training process, namely overconfident teacher and confirmation bias. Inspired by the success of cross-view learning in fields such as semi-supervised learning, in this work we introduce within-view and cross-view regularisations to standard logit-based distillation frameworks to combat the above cruxes. We also perform confidence-based soft label mining to improve the quality of distilling signals from the teacher, which further mitigates the confirmation bias problem. Despite its apparent simplicity, the proposed Consistency-Regularisation-based Logit Distillation (CRLD) significantly boosts student learning, setting new state-of-the-art results on the standard CIFAR-100, Tiny-ImageNet, and ImageNet datasets across a diversity of teacher and student architectures, whilst introducing no extra network parameters. Orthogonal to on-going logit-based distillation research, our method enjoys excellent generalisation properties and, without bells and whistles, boosts the performance of various existing approaches by considerable margins.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes