CVAILGMar 17, 2022

CYBORGS: Contrastively Bootstrapping Object Representations by Grounding in Segmentation

arXiv:2203.09343v22 citationsh-index: 7
Originality Incremental advance
AI Analysis

This work addresses a bottleneck in contrastive learning for computer vision, offering an incremental improvement over preprocessing-based methods.

The paper tackles the problem of semantically inconsistent content in contrastive learning on complex scenes by proposing a framework that jointly learns representations and segmentation, iteratively improving both. Experiments show robust transfer to downstream tasks in classification, detection, and segmentation.

Many recent approaches in contrastive learning have worked to close the gap between pretraining on iconic images like ImageNet and pretraining on complex scenes like COCO. This gap exists largely because commonly used random crop augmentations obtain semantically inconsistent content in crowded scene images of diverse objects. Previous works use preprocessing pipelines to localize salient objects for improved cropping, but an end-to-end solution is still elusive. In this work, we propose a framework which accomplishes this goal via joint learning of representations and segmentation. We leverage segmentation masks to train a model with a mask-dependent contrastive loss, and use the partially trained model to bootstrap better masks. By iterating between these two components, we ground the contrastive updates in segmentation information, and simultaneously improve segmentation throughout pretraining. Experiments show our representations transfer robustly to downstream tasks in classification, detection and segmentation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes