LGAIOct 10, 2025

Rate optimal learning of equilibria from data

arXiv:2510.09325v11 citationsh-index: 61
Originality Highly original
AI Analysis

This work addresses theoretical gaps in multi-agent imitation learning, providing foundational insights and practical algorithms for researchers and practitioners in AI and robotics.

The paper tackles the problem of learning equilibria in multi-agent imitation learning (MAIL) by characterizing limits in non-interactive settings and introducing an interactive algorithm, MAIL-WARM, which improves sample complexity from O(ε^{-8}) to O(ε^{-2}), matching a lower bound, with numerical validation in environments like grid worlds.

We close open theoretical gaps in Multi-Agent Imitation Learning (MAIL) by characterizing the limits of non-interactive MAIL and presenting the first interactive algorithm with near-optimal sample complexity. In the non-interactive setting, we prove a statistical lower bound that identifies the all-policy deviation concentrability coefficient as the fundamental complexity measure, and we show that Behavior Cloning (BC) is rate-optimal. For the interactive setting, we introduce a framework that combines reward-free reinforcement learning with interactive MAIL and instantiate it with an algorithm, MAIL-WARM. It improves the best previously known sample complexity from $\mathcal{O}(\varepsilon^{-8})$ to $\mathcal{O}(\varepsilon^{-2}),$ matching the dependence on $\varepsilon$ implied by our lower bound. Finally, we provide numerical results that support our theory and illustrate, in environments such as grid worlds, where Behavior Cloning fails to learn.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes