CVAIJul 16, 2024

Discriminative and Consistent Representation Distillation

arXiv:2407.11802v51 citationsh-index: 8
Originality Incremental advance
AI Analysis

This addresses the challenge of efficient model deployment for practitioners by enhancing knowledge distillation, though it is incremental as it builds on existing contrastive learning and regularization techniques.

The paper tackles the problem of knowledge distillation by proposing Discriminative and Consistent Distillation (DCD), which uses contrastive loss and consistency regularization to improve student model performance, achieving state-of-the-art results on CIFAR-100 and ImageNet with student models sometimes surpassing teacher accuracy.

Knowledge Distillation (KD) aims to transfer knowledge from a large teacher model to a smaller student model. While contrastive learning has shown promise in self-supervised learning by creating discriminative representations, its application in knowledge distillation remains limited and focuses primarily on discrimination, neglecting the structural relationships captured by the teacher model. To address this limitation, we propose Discriminative and Consistent Distillation (DCD), which employs a contrastive loss along with a consistency regularization to minimize the discrepancy between the distributions of teacher and student representations. Our method introduces learnable temperature and bias parameters that adapt during training to balance these complementary objectives, replacing the fixed hyperparameters commonly used in contrastive learning approaches. Through extensive experiments on CIFAR-100 and ImageNet ILSVRC-2012, we demonstrate that DCD achieves state-of-the-art performance, with the student model sometimes surpassing the teacher's accuracy. Furthermore, we show that DCD's learned representations exhibit superior cross-dataset generalization when transferred to Tiny ImageNet and STL-10.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes