LG CVDec 29, 2023

Generalization properties of contrastive world models

Kandan Ramakrishnan, R. James Cotton, Xaq Pitkow, Andreas S. Tolias

arXiv:2401.00057v12.6h-index: 64

Originality Synthesis-oriented

AI Analysis

This work addresses the generalization problem in AI for researchers, highlighting limitations in current object-centric models, but it is incremental as it focuses on testing existing methods rather than proposing new solutions.

The paper systematically tested the out-of-distribution generalization of contrastive world models under various scenarios like new object attributes and conjunctions, finding that these models fail to generalize, with performance drops correlating with how out-of-distribution the samples are.

Recent work on object-centric world models aim to factorize representations in terms of objects in a completely unsupervised or self-supervised manner. Such world models are hypothesized to be a key component to address the generalization problem. While self-supervision has shown improved performance however, OOD generalization has not been systematically and explicitly tested. In this paper, we conduct an extensive study on the generalization properties of contrastive world model. We systematically test the model under a number of different OOD generalization scenarios such as extrapolation to new object attributes, introducing new conjunctions or new attributes. Our experiments show that the contrastive world model fails to generalize under the different OOD tests and the drop in performance depends on the extent to which the samples are OOD. When visualizing the transition updates and convolutional feature maps, we observe that any changes in object attributes (such as previously unseen colors, shapes, or conjunctions of color and shape) breaks down the factorization of object representations. Overall, our work highlights the importance of object-centric representations for generalization and current models are limited in their capacity to learn such representations required for human-level generalization.

View on arXiv PDF

Similar