CVNov 17, 2022

Self-Supervised Visual Representation Learning via Residual Momentum

arXiv:2211.09861v29 citationsh-index: 14
Originality Incremental advance
AI Analysis

This addresses a bottleneck in self-supervised learning for computer vision, offering an incremental improvement to existing frameworks.

The paper tackles the performance gap between student and teacher encoders in momentum-based self-supervised learning by proposing residual momentum, which reduces this gap and improves representation learning, achieving state-of-the-art results on benchmark datasets.

Self-supervised learning (SSL) approaches have shown promising capabilities in learning the representation from unlabeled data. Amongst them, momentum-based frameworks have attracted significant attention. Despite being a great success, these momentum-based SSL frameworks suffer from a large gap in representation between the online encoder (student) and the momentum encoder (teacher), which hinders performance on downstream tasks. This paper is the first to investigate and identify this invisible gap as a bottleneck that has been overlooked in the existing SSL frameworks, potentially preventing the models from learning good representation. To solve this problem, we propose "residual momentum" to directly reduce this gap to encourage the student to learn the representation as close to that of the teacher as possible, narrow the performance gap with the teacher, and significantly improve the existing SSL. Our method is straightforward, easy to implement, and can be easily plugged into other SSL frameworks. Extensive experimental results on numerous benchmark datasets and diverse network architectures have demonstrated the effectiveness of our method over the state-of-the-art contrastive learning baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes