CVAIMay 24, 2024

Retro: Reusing teacher projection head for efficient embedding distillation on Lightweight Models via Self-supervised Learning

arXiv:2405.15311v3h-index: 3BMVC
Originality Incremental advance
AI Analysis

This addresses the problem of efficient knowledge distillation for lightweight models in self-supervised learning, though it appears incremental as it builds on existing distillation methods.

The paper tackles the challenge of accurately mimicking a teacher's embedding in self-supervised distillation for lightweight models by reusing the teacher's projection head, resulting in improvements such as boosting EfficientNet-B0's ImageNet linear result to 66.9%, 69.3%, and 69.8% with fewer parameters.

Self-supervised learning (SSL) is gaining attention for its ability to learn effective representations with large amounts of unlabeled data. Lightweight models can be distilled from larger self-supervised pre-trained models using contrastive and consistency constraints. Still, the different sizes of the projection heads make it challenging for students to mimic the teacher's embedding accurately. We propose \textsc{Retro}, which reuses the teacher's projection head for students, and our experimental results demonstrate significant improvements over the state-of-the-art on all lightweight models. For instance, when training EfficientNet-B0 using ResNet-50/101/152 as teachers, our approach improves the linear result on ImageNet to $66.9\%$, $69.3\%$, and $69.8\%$, respectively, with significantly fewer parameters.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes