LGCVJun 9, 2025

Identifiable Object Representations under Spatial Ambiguities

arXiv:2506.07806v1h-index: 13ICML
Originality Highly original
AI Analysis

This addresses a fundamental challenge in computer vision for applications requiring human-like reasoning, with incremental improvements over prior single-view methods.

The paper tackles the problem of learning modular object-centric representations under spatial ambiguities like occlusions and view ambiguities, introducing a multi-view probabilistic approach that achieves robust performance on standard benchmarks and novel complex datasets without requiring viewpoint annotations.

Modular object-centric representations are essential for *human-like reasoning* but are challenging to obtain under spatial ambiguities, *e.g. due to occlusions and view ambiguities*. However, addressing challenges presents both theoretical and practical difficulties. We introduce a novel multi-view probabilistic approach that aggregates view-specific slots to capture *invariant content* information while simultaneously learning disentangled global *viewpoint-level* information. Unlike prior single-view methods, our approach resolves spatial ambiguities, provides theoretical guarantees for identifiability, and requires *no viewpoint annotations*. Extensive experiments on standard benchmarks and novel complex datasets validate our method's robustness and scalability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes