Empowering Multi-Robot Cooperation via Sequential World Models
This work addresses the problem of enabling advanced cooperative behaviors in multi-robot systems, which is incremental as it builds on existing model-based reinforcement learning methods.
The paper tackles the challenge of extending model-based reinforcement learning to physical multi-robot cooperation by proposing the Sequential World Model (SeqWM), which outperforms state-of-the-art baselines in performance and sample efficiency on benchmarks like Bi-DexHands and Multi-Quadruped, and is validated on real quadruped robots.
Model-based reinforcement learning (MBRL) has achieved remarkable success in robotics due to its high sample efficiency and planning capability. However, extending MBRL to physical multi-robot cooperation remains challenging due to the complexity of joint dynamics. To address this challenge, we propose the Sequential World Model (SeqWM), a novel framework that integrates the sequential paradigm into multi-robot MBRL. SeqWM employs independent, autoregressive agent-wise world models to represent joint dynamics, where each agent generates its future trajectory and plans its actions based on the predictions of its predecessors. This design lowers modeling complexity and enables the emergence of advanced cooperative behaviors through explicit intention sharing. Experiments on Bi-DexHands and Multi-Quadruped demonstrate that SeqWM outperforms existing state-of-the-art model-based and model-free baselines in both overall performance and sample efficiency, while exhibiting advanced cooperative behaviors such as predictive adaptation, temporal alignment, and role division. Furthermore, SeqWM has been successfully deployed on physical quadruped robots, validating its effectiveness in real-world multi-robot systems. Demos and code are available at: https://github.com/zhaozijie2022/seqwm