CVSep 29, 2021

Localizing Objects with Self-Supervised Transformers and no Labels

arXiv:2109.14279v1276 citationsHas Code
Originality Highly original
AI Analysis

This work addresses the problem of reducing annotation costs for object localization in computer vision, presenting a novel approach that outperforms existing methods.

The paper tackles unsupervised object localization in images by proposing LOST, a method that uses self-supervised vision transformer features without external proposals or image collection exploration, achieving up to 8 CorLoc points improvement on PASCAL VOC 2012 and an additional 7 points boost with a class-agnostic detector.

Localizing objects in image collections without supervision can help to avoid expensive annotation campaigns. We propose a simple approach to this problem, that leverages the activation features of a vision transformer pre-trained in a self-supervised manner. Our method, LOST, does not require any external object proposal nor any exploration of the image collection; it operates on a single image. Yet, we outperform state-of-the-art object discovery methods by up to 8 CorLoc points on PASCAL VOC 2012. We also show that training a class-agnostic detector on the discovered objects boosts results by another 7 points. Moreover, we show promising results on the unsupervised object discovery task. The code to reproduce our results can be found at https://github.com/valeoai/LOST.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes