CVAIJan 12, 2021

SEED: Self-supervised Distillation For Visual Representation

arXiv:2101.04731v2215 citations
Originality Highly original
AI Analysis

It addresses the performance gap for small models in self-supervised learning, which is incremental as it adapts existing distillation ideas to a self-supervised context.

The paper tackles the problem of self-supervised learning for small models, which underperform with contrastive methods, by proposing SEED, a self-supervised distillation method that transfers knowledge from a larger teacher to a smaller student, resulting in top-1 accuracy improvements from 42.2% to 67.6% on EfficientNet-B0 and from 36.3% to 68.2% on MobileNet-v3-Large on ImageNet-1k.

This paper is concerned with self-supervised learning for small models. The problem is motivated by our empirical studies that while the widely used contrastive self-supervised learning method has shown great progress on large model training, it does not work well for small models. To address this problem, we propose a new learning paradigm, named SElf-SupErvised Distillation (SEED), where we leverage a larger network (as Teacher) to transfer its representational knowledge into a smaller architecture (as Student) in a self-supervised fashion. Instead of directly learning from unlabeled data, we train a student encoder to mimic the similarity score distribution inferred by a teacher over a set of instances. We show that SEED dramatically boosts the performance of small networks on downstream tasks. Compared with self-supervised baselines, SEED improves the top-1 accuracy from 42.2% to 67.6% on EfficientNet-B0 and from 36.3% to 68.2% on MobileNet-v3-Large on the ImageNet-1k dataset.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes