CVLGOct 11, 2023

CrIBo: Self-Supervised Learning via Cross-Image Object-Level Bootstrapping

arXiv:2310.07855v216 citationsh-index: 75Has Code
AI Analysis

This work addresses the limitation of global bootstrapping in self-supervised learning for scene-centric images, offering a novel approach to improve dense visual representations.

The paper tackles the problem of self-supervised representation learning for scene-centric datasets by introducing a Cross-Image Object-Level Bootstrapping method, which achieves state-of-the-art performance in in-context learning and competitive results in downstream segmentation tasks.

Leveraging nearest neighbor retrieval for self-supervised representation learning has proven beneficial with object-centric images. However, this approach faces limitations when applied to scene-centric datasets, where multiple objects within an image are only implicitly captured in the global representation. Such global bootstrapping can lead to undesirable entanglement of object representations. Furthermore, even object-centric datasets stand to benefit from a finer-grained bootstrapping approach. In response to these challenges, we introduce a novel Cross-Image Object-Level Bootstrapping method tailored to enhance dense visual representation learning. By employing object-level nearest neighbor bootstrapping throughout the training, CrIBo emerges as a notably strong and adequate candidate for in-context learning, leveraging nearest neighbor retrieval at test time. CrIBo shows state-of-the-art performance on the latter task while being highly competitive in more standard downstream segmentation tasks. Our code and pretrained models are publicly available at https://github.com/tileb1/CrIBo.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes