Bounding Distributional Shifts in World Modeling through Novelty Detection
This work addresses robustness in model-based planning for robotics, but it is incremental as it builds on existing architectures like DINO-WM.
The paper tackles the sensitivity of visual world models to training quality by using a variational autoencoder as a novelty detector to prevent divergence during planning, resulting in improved data efficiency in simulated robot environments.
Recent work on visual world models shows significant promise in latent state dynamics obtained from pre-trained image backbones. However, most of the current approaches are sensitive to training quality, requiring near-complete coverage of the action and state space during training to prevent divergence during inference. To make a model-based planning algorithm more robust to the quality of the learned world model, we propose in this work to use a variational autoencoder as a novelty detector to ensure that proposed action trajectories during planning do not cause the learned model to deviate from the training data distribution. To evaluate the effectiveness of this approach, a series of experiments in challenging simulated robot environments was carried out, with the proposed method incorporated into a model-predictive control policy loop extending the DINO-WM architecture. The results clearly show that the proposed method improves over state-of-the-art solutions in terms of data efficiency.