Object-Oriented Dynamics Learning through Multi-Level Abstraction
This work addresses the challenge of improving generalization and interpretability in object-oriented dynamics learning for AI systems, representing an incremental advance over existing approaches.
The paper tackles the problem of learning object-based dynamics from visual observations in environments with multiple dynamic objects, presenting MAOP, a framework that significantly outperforms previous methods in sample efficiency and generalization to novel environments, with results showing efficient planning comparable to true models.
Object-based approaches for learning action-conditioned dynamics has demonstrated promise for generalization and interpretability. However, existing approaches suffer from structural limitations and optimization difficulties for common environments with multiple dynamic objects. In this paper, we present a novel self-supervised learning framework, called Multi-level Abstraction Object-oriented Predictor (MAOP), which employs a three-level learning architecture that enables efficient object-based dynamics learning from raw visual observations. We also design a spatial-temporal relational reasoning mechanism for MAOP to support instance-level dynamics learning and handle partial observability. Our results show that MAOP significantly outperforms previous methods in terms of sample efficiency and generalization over novel environments for learning environment models. We also demonstrate that learned dynamics models enable efficient planning in unseen environments, comparable to true environment models. In addition, MAOP learns semantically and visually interpretable disentangled representations.