CVOct 23, 2023

SAMCLR: Contrastive pre-training on complex scenes using SAM for view sampling

arXiv:2310.14736v21 citationsh-index: 2
Originality Incremental advance
AI Analysis

This addresses a bottleneck in self-supervised learning for computer vision when applied to complex scenes, offering an incremental improvement over existing methods.

The paper tackles the problem of self-supervised contrastive learning on complex scenes with multiple objects, where traditional methods may fail to sample views from the same category. It proposes SAMCLR, which uses SAM to segment images and sample views from the same region, showing that it performs at least on par with and often significantly outperforms SimCLR, DINO, and MoCo on classification tasks like CIFAR-10, STL10, and ImageNette after pre-training on Cityscapes and ADE20K.

In Computer Vision, self-supervised contrastive learning enforces similar representations between different views of the same image. The pre-training is most often performed on image classification datasets, like ImageNet, where images mainly contain a single class of objects. However, when dealing with complex scenes with multiple items, it becomes very unlikely for several views of the same image to represent the same object category. In this setting, we propose SAMCLR, an add-on to SimCLR which uses SAM to segment the image into semantic regions, then sample the two views from the same region. Preliminary results show empirically that when pre-training on Cityscapes and ADE20K, then evaluating on classification on CIFAR-10, STL10 and ImageNette, SAMCLR performs at least on par with, and most often significantly outperforms not only SimCLR, but also DINO and MoCo.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes