RO LG MASep 29, 2025

Curriculum Imitation Learning of Distributed Multi-Robot Policies

Jesús Roche, Eduardo Sebastián, Eduardo Montijano

arXiv:2509.25097v25.71 citationsh-index: 21

Originality Incremental advance

AI Analysis

This work addresses the problem of scalable and robust policy learning for multi-robot systems, which is incremental as it builds on existing imitation learning frameworks with specific enhancements.

The paper tackled the challenge of learning distributed multi-robot control policies by improving long-term coordination and handling realistic training data limitations, resulting in enhanced accuracy and robustness across tasks with varying team sizes and noise levels.

Learning control policies for multi-robot systems (MRS) remains a major challenge due to long-term coordination and the difficulty of obtaining realistic training data. In this work, we address both limitations within an imitation learning framework. First, we shift the typical role of Curriculum Learning in MRS, from scalability with the number of robots, to focus on improving long-term coordination. We propose a curriculum strategy that gradually increases the length of expert trajectories during training, stabilizing learning and enhancing the accuracy of long-term behaviors. Second, we introduce a method to approximate the egocentric perception of each robot using only third-person global state demonstrations. Our approach transforms idealized trajectories into locally available observations by filtering neighbors, converting reference frames, and simulating onboard sensor variability. Both contributions are integrated into a physics-informed technique to produce scalable, distributed policies from observations. We conduct experiments across two tasks with varying team sizes and noise levels. Results show that our curriculum improves long-term accuracy, while our perceptual estimation method yields policies that are robust to realistic uncertainty. Together, these strategies enable the learning of robust, distributed controllers from global demonstrations, even in the absence of expert actions or onboard measurements.

View on arXiv PDF

Similar