CVAIFeb 9, 2025

Contrastive Representation Distillation via Multi-Scale Feature Decoupling

arXiv:2502.05835v31 citationsh-index: 1
Originality Incremental advance
AI Analysis

This work addresses efficiency and performance issues in knowledge distillation for compact neural networks, representing an incremental improvement over existing methods.

The paper tackles the problem of semantic confusion and low efficiency in feature-based knowledge distillation by proposing MSDCRD, a framework that decouples global features into multi-scale local features and uses tailored contrastive losses, achieving superior performance in homogeneous and heterogeneous teacher-student settings.

Knowledge distillation enhances the performance of compact student networks by transferring knowledge from more powerful teacher networks without introducing additional parameters. In the feature space, local regions within an individual global feature encode distinct yet interdependent semantic information. Previous feature-based distillation methods mainly emphasize global feature alignment while neglecting the decoupling of local regions within an individual global feature, which often results in semantic confusion and suboptimal performance. Moreover, conventional contrastive representation distillation suffers from low efficiency due to its reliance on a large memory buffer to store feature samples. To address these limitations, this work proposes MSDCRD, a model-agnostic distillation framework that systematically decouples global features into multi-scale local features and leverages the resulting semantically rich feature samples with tailored sample-wise and feature-wise contrastive losses. This design enables efficient distillation using only a single batch, eliminating the dependence on external memory. Extensive experiments demonstrate that MSDCRD achieves superior performance not only in homogeneous teacher-student settings but also in heterogeneous architectures where feature discrepancies are more pronounced, highlighting its strong generalization capability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes