ROMay 12

Coordinated Diffusion: Generating Multi-Agent Behavior Without Multi-Agent Demonstrations

Lasse Peters, Laura Ferranti, Javier Alonso-Mora, Andrea Bajcsy

arXiv:2605.1148567.5

Predicted impact top 27% in RO · last 90 daysOriginality Highly original

AI Analysis

This work addresses the data bottleneck in multi-agent imitation learning by enabling coordination from single-agent data, which is crucial for applications like multi-robot systems where coordinated demonstrations are expensive to obtain.

CoDi generates coordinated multi-agent behavior using only single-agent demonstrations and a user-defined cost function, avoiding the need for costly multi-agent data. In two-arm manipulation tasks, it achieves robust coordination with higher data efficiency than multi-agent baselines.

Imitation learning powered by generative models has proven effective for modeling complex single-agent behaviors. However, teaching multi-agent systems, like multiple arms or vehicles, to coordinate through imitation learning is hindered by a fundamental data bottleneck: as the joint state-action space grows exponentially with the number of agents, collecting a sufficient amount of coordinated multi-agent demonstrations becomes extremely costly. In this work, we ask: how can we leverage single-agent demonstration data to learn multi-agent policies? We present Coordinated Diffusion (CoDi), a framework that couples independently trained single-agent diffusion policies through a user-defined multi-agent cost function, without requiring any coordinated demonstrations. We derive a new diffusion-based sampling scheme wherein the diffusion score function decomposes into independent, single-agent pre-trained base policies plus a cost-driven guidance term that coordinates these base policies into cohesive multi-agent behavior. We show that this guidance term can be estimated in a gradient-free manner, making CoDi applicable to black-box, non-differentiable cost functions without additional training. Theoretically and empirically, we analyze the conditions under which this composition can faithfully approximate a target multi-agent behavior. We find a complementary role for demonstration data versus the cost function: single-agent demonstrations must cover the support of the desired multi-agent behavior, while the cost function must promote desired behavior from this product of single-agent policies. Our results in simulation and hardware experiments of a two-arm manipulation task show that CoDi discovers robust coordinated behavior from single-agent data, is more data-efficient than multi-agent baselines, and highlights the importance of joint guidance, base policy support, and cost design.

View on arXiv PDF

Similar