CVFeb 7, 2022

Crafting Better Contrastive Views for Siamese Representation Learning

arXiv:2202.03278v3128 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of generating high-quality views for self-supervised learning, offering a plug-and-play solution that boosts performance across multiple frameworks and tasks, though it is incremental in nature.

The paper tackles the problem of designing better contrastive pairs for Siamese representation learning by proposing ContrastiveCrop, which improves classification accuracy by 0.4% to 2.0% on datasets like CIFAR-10 and CIFAR-100, and enhances downstream tasks such as detection and segmentation.

Recent self-supervised contrastive learning methods greatly benefit from the Siamese structure that aims at minimizing distances between positive pairs. For high performance Siamese representation learning, one of the keys is to design good contrastive pairs. Most previous works simply apply random sampling to make different crops of the same image, which overlooks the semantic information that may degrade the quality of views. In this work, we propose ContrastiveCrop, which could effectively generate better crops for Siamese representation learning. Firstly, a semantic-aware object localization strategy is proposed within the training process in a fully unsupervised manner. This guides us to generate contrastive views which could avoid most false positives (i.e., object vs. background). Moreover, we empirically find that views with similar appearances are trivial for the Siamese model training. Thus, a center-suppressed sampling is further designed to enlarge the variance of crops. Remarkably, our method takes a careful consideration of positive pairs for contrastive learning with negligible extra training overhead. As a plug-and-play and framework-agnostic module, ContrastiveCrop consistently improves SimCLR, MoCo, BYOL, SimSiam by 0.4% ~ 2.0% classification accuracy on CIFAR-10, CIFAR-100, Tiny ImageNet and STL-10. Superior results are also achieved on downstream detection and segmentation tasks when pre-trained on ImageNet-1K.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes