LG CVJun 9, 2025

Identifiable Object Representations under Spatial Ambiguities

Avinash Kori, Francesca Toni, Ben Glocker

arXiv:2506.07806v14.1h-index: 13ICML

Originality Highly original

AI Analysis

This addresses a fundamental challenge in computer vision for applications requiring human-like reasoning, with incremental improvements over prior single-view methods.

The paper tackles the problem of learning modular object-centric representations under spatial ambiguities like occlusions and view ambiguities, introducing a multi-view probabilistic approach that achieves robust performance on standard benchmarks and novel complex datasets without requiring viewpoint annotations.

Modular object-centric representations are essential for *human-like reasoning* but are challenging to obtain under spatial ambiguities, *e.g. due to occlusions and view ambiguities*. However, addressing challenges presents both theoretical and practical difficulties. We introduce a novel multi-view probabilistic approach that aggregates view-specific slots to capture *invariant content* information while simultaneously learning disentangled global *viewpoint-level* information. Unlike prior single-view methods, our approach resolves spatial ambiguities, provides theoretical guarantees for identifiability, and requires *no viewpoint annotations*. Extensive experiments on standard benchmarks and novel complex datasets validate our method's robustness and scalability.

View on arXiv PDF

Similar