CVOct 19, 2021

Constrained Mean Shift for Representation Learning

arXiv:2110.10309v1
Originality Incremental advance
AI Analysis

This work addresses representation learning for computer vision tasks, offering an incremental improvement by incorporating additional knowledge into non-contrastive methods.

The paper tackles representation learning by generalizing the mean-shift algorithm with constraints on nearest neighbors to produce semantically purer representations, resulting in improved transfer performance on ImageNet-1k pretraining and robustness to label noise.

We are interested in representation learning from labeled or unlabeled data. Inspired by recent success of self-supervised learning (SSL), we develop a non-contrastive representation learning method that can exploit additional knowledge. This additional knowledge may come from annotated labels in the supervised setting or an SSL model from another modality in the SSL setting. Our main idea is to generalize the mean-shift algorithm by constraining the search space of nearest neighbors, resulting in semantically purer representations. Our method simply pulls the embedding of an instance closer to its nearest neighbors in a search space that is constrained using the additional knowledge. By leveraging this non-contrastive loss, we show that the supervised ImageNet-1k pretraining with our method results in better transfer performance as compared to the baselines. Further, we demonstrate that our method is relatively robust to label noise. Finally, we show that it is possible to use the noisy constraint across modalities to train self-supervised video models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes