CVMar 31, 2023

Shepherding Slots to Objects: Towards Stable and Robust Object-Centric Learning

arXiv:2303.17842v121 citationsh-index: 10Has Code
Originality Incremental advance
AI Analysis

This work addresses the problem of unstable object-centric representation for researchers in computer vision, offering an incremental improvement for single-view image analysis.

The paper tackles the challenge of inconsistent object-centric learning in single-view images by introducing SLASH, a framework that improves Slot Attention with modules to reduce background noise and guide slot focus, achieving strong performance across four datasets.

Object-centric learning (OCL) aspires general and compositional understanding of scenes by representing a scene as a collection of object-centric representations. OCL has also been extended to multi-view image and video datasets to apply various data-driven inductive biases by utilizing geometric or temporal information in the multi-image data. Single-view images carry less information about how to disentangle a given scene than videos or multi-view images do. Hence, owing to the difficulty of applying inductive biases, OCL for single-view images remains challenging, resulting in inconsistent learning of object-centric representation. To this end, we introduce a novel OCL framework for single-view images, SLot Attention via SHepherding (SLASH), which consists of two simple-yet-effective modules on top of Slot Attention. The new modules, Attention Refining Kernel (ARK) and Intermediate Point Predictor and Encoder (IPPE), respectively, prevent slots from being distracted by the background noise and indicate locations for slots to focus on to facilitate learning of object-centric representation. We also propose a weak semi-supervision approach for OCL, whilst our proposed framework can be used without any assistant annotation during the inference. Experiments show that our proposed method enables consistent learning of object-centric representation and achieves strong performance across four datasets. Code is available at \url{https://github.com/object-understanding/SLASH}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes