CVOct 19, 2023

Unsupervised Object Localization in the Era of Self-Supervised ViTs: A Survey

arXiv:2310.12904v320 citationsh-index: 21Has Code
Originality Synthesis-oriented
AI Analysis

It addresses the need for open-world vision systems to perform perception tasks without predefined object categories, but as a survey, it is incremental in summarizing existing work rather than introducing new methods.

This survey tackles the problem of unsupervised object localization by reviewing methods that discover objects in images without manual annotation, focusing on the use of self-supervised pre-trained Vision Transformers (ViTs). It compiles resources and links to various approaches in a public repository.

The recent enthusiasm for open-world vision systems show the high interest of the community to perform perception tasks outside of the closed-vocabulary benchmark setups which have been so popular until now. Being able to discover objects in images/videos without knowing in advance what objects populate the dataset is an exciting prospect. But how to find objects without knowing anything about them? Recent works show that it is possible to perform class-agnostic unsupervised object localization by exploiting self-supervised pre-trained features. We propose here a survey of unsupervised object localization methods that discover objects in images without requiring any manual annotation in the era of self-supervised ViTs. We gather links of discussed methods in the repository https://github.com/valeoai/Awesome-Unsupervised-Object-Localization.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes