CKD: Contrastive Knowledge Distillation from A Sample-wise Perspective
This work addresses a specific bottleneck in knowledge distillation for computer vision practitioners, offering an incremental improvement over existing methods.
The paper tackles the problem of over-reliance on feature similarity in knowledge distillation by proposing a contrastive framework that aligns teacher-student logits per sample while preserving semantic relationships, achieving effectiveness across image classification, object detection, and instance segmentation tasks on datasets like CIFAR-100, ImageNet-1K, and MS COCO.
In this paper, we propose a simple yet effective contrastive knowledge distillation framework that achieves sample-wise logit alignment while preserving semantic consistency. Conventional knowledge distillation approaches exhibit over-reliance on feature similarity per sample, which risks overfitting, and contrastive approaches focus on inter-class discrimination at the expense of intra-sample semantic relationships. Our approach transfers "dark knowledge" through teacher-student contrastive alignment at the sample level. Specifically, our method first enforces intra-sample alignment by directly minimizing teacher-student logit discrepancies within individual samples. Then, we utilize inter-sample contrasts to preserve semantic dissimilarities across samples. By redefining positive pairs as aligned teacher-student logits from identical samples and negative pairs as cross-sample logit combinations, we reformulate these dual constraints into an InfoNCE loss framework, reducing computational complexity lower than sample squares while eliminating dependencies on temperature parameters and large batch sizes. We conduct comprehensive experiments across three benchmark datasets, including the CIFAR-100, ImageNet-1K, and MS COCO datasets, and experimental results clearly confirm the effectiveness of the proposed method on image classification, object detection, and instance segmentation tasks.