Generalization and Robustness Implications in Object-Centric Learning
This work addresses robustness and generalization in object-centric learning for computer vision, but it is incremental as it primarily tests existing models under new conditions.
The paper evaluated unsupervised object-centric models on multi-object datasets, finding they improve downstream task performance and are robust to structured distribution shifts like unseen object properties, but less so to unstructured shifts like occlusions.
The idea behind object-centric representation learning is that natural scenes can better be modeled as compositions of objects and their relations as opposed to distributed representations. This inductive bias can be injected into neural networks to potentially improve systematic generalization and performance of downstream tasks in scenes with multiple objects. In this paper, we train state-of-the-art unsupervised models on five common multi-object datasets and evaluate segmentation metrics and downstream object property prediction. In addition, we study generalization and robustness by investigating the settings where either a single object is out of distribution -- e.g., having an unseen color, texture, or shape -- or global properties of the scene are altered -- e.g., by occlusions, cropping, or increasing the number of objects. From our experimental study, we find object-centric representations to be useful for downstream tasks and generally robust to most distribution shifts affecting objects. However, when the distribution shift affects the input in a less structured manner, robustness in terms of segmentation and downstream task performance may vary significantly across models and distribution shifts.