CVNov 22, 2021

Why Do Self-Supervised Models Transfer? Investigating the Impact of Invariance on Downstream Tasks

arXiv:2111.11398v216 citations
Originality Incremental advance
AI Analysis

This addresses the problem of optimizing self-supervised learning for varied computer vision tasks, though it is incremental as it builds on existing contrastive methods.

The paper investigates how invariance to data augmentations in self-supervised models affects downstream task performance, finding that different tasks require opposite invariances and that fusing complementary representations improves transferability across diverse tasks.

Self-supervised learning is a powerful paradigm for representation learning on unlabelled images. A wealth of effective new methods based on instance matching rely on data-augmentation to drive learning, and these have reached a rough agreement on an augmentation scheme that optimises popular recognition benchmarks. However, there is strong reason to suspect that different tasks in computer vision require features to encode different (in)variances, and therefore likely require different augmentation strategies. In this paper, we measure the invariances learned by contrastive methods and confirm that they do learn invariance to the augmentations used and further show that this invariance largely transfers to related real-world changes in pose and lighting. We show that learned invariances strongly affect downstream task performance and confirm that different downstream tasks benefit from polar opposite (in)variances, leading to performance loss when the standard augmentation strategy is used. Finally, we demonstrate that a simple fusion of representations with complementary invariances ensures wide transferability to all the diverse downstream tasks considered.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes