Slot Structured World Models
This work addresses a bottleneck in building intelligent AI systems by improving object perception and interaction modeling, though it is incremental as it builds on existing slot attention and graph neural network methods.
The paper tackles the problem of extracting object-centric representations and disentangling multiple similar objects in world models, introducing Slot Structured World Models (SSWM) that combine an object-centric encoder with a latent graph-based dynamics model, resulting in consistent outperformance of baselines on multi-step prediction tasks in the Spriteworld benchmark.
The ability to perceive and reason about individual objects and their interactions is a goal to be achieved for building intelligent artificial systems. State-of-the-art approaches use a feedforward encoder to extract object embeddings and a latent graph neural network to model the interaction between these object embeddings. However, the feedforward encoder can not extract {\it object-centric} representations, nor can it disentangle multiple objects with similar appearance. To solve these issues, we introduce {\it Slot Structured World Models} (SSWM), a class of world models that combines an {\it object-centric} encoder (based on Slot Attention) with a latent graph-based dynamics model. We evaluate our method in the Spriteworld benchmark with simple rules of physical interaction, where Slot Structured World Models consistently outperform baselines on a range of (multi-step) prediction tasks with action-conditional object interactions. All code to reproduce paper experiments is available from \url{https://github.com/JonathanCollu/Slot-Structured-World-Models}.