CVMar 27, 2023

Object Discovery from Motion-Guided Tokens

Zhipeng Bao, Pavel Tokmakov, Yu-Xiong Wang, Adrien Gaidon, Martial Hebert

arXiv:2303.15555v116.430 citationsh-index: 107Has Code

Originality Incremental advance

AI Analysis

This addresses the fundamental challenge of separating objects from backgrounds in computer vision, offering an incremental improvement over previous methods.

The paper tackles the problem of object discovery without manual labels by introducing a transformer decoder that combines motion-guidance and tokenization, improving state-of-the-art results on synthetic and real datasets.

Object discovery -- separating objects from the background without manual labels -- is a fundamental open challenge in computer vision. Previous methods struggle to go beyond clustering of low-level cues, whether handcrafted (e.g., color, texture) or learned (e.g., from auto-encoders). In this work, we augment the auto-encoder representation learning framework with two key components: motion-guidance and mid-level feature tokenization. Although both have been separately investigated, we introduce a new transformer decoder showing that their benefits can compound thanks to motion-guided vector quantization. We show that our architecture effectively leverages the synergy between motion and tokenization, improving upon the state of the art on both synthetic and real datasets. Our approach enables the emergence of interpretable object-specific mid-level features, demonstrating the benefits of motion-guidance (no labeling) and quantization (interpretability, memory efficiency).

View on arXiv PDF Code

Similar