CVMar 23, 2023

CrOC: Cross-View Online Clustering for Dense Visual Representation Learning

arXiv:2303.13245v124 citationsh-index: 75Has Code
Originality Incremental advance
AI Analysis

This work addresses the problem of learning dense visual representations without labels for computer vision researchers, offering a more generalizable approach that avoids hand-crafted priors, though it appears incremental as it builds on existing clustering and consistency techniques.

The paper tackles unsupervised dense visual representation learning from scene-centric data by introducing CrOC, a method combining cross-view consistency with online clustering, achieving excellent performance on linear and unsupervised segmentation transfer tasks across various datasets and video object segmentation.

Learning dense visual representations without labels is an arduous task and more so from scene-centric data. We propose to tackle this challenging problem by proposing a Cross-view consistency objective with an Online Clustering mechanism (CrOC) to discover and segment the semantics of the views. In the absence of hand-crafted priors, the resulting method is more generalizable and does not require a cumbersome pre-processing step. More importantly, the clustering algorithm conjointly operates on the features of both views, thereby elegantly bypassing the issue of content not represented in both views and the ambiguous matching of objects from one crop to the other. We demonstrate excellent performance on linear and unsupervised segmentation transfer tasks on various datasets and similarly for video object segmentation. Our code and pre-trained models are publicly available at https://github.com/stegmuel/CrOC.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes