Rate optimal learning of equilibria from data
This work addresses theoretical gaps in multi-agent imitation learning, providing foundational insights and practical algorithms for researchers and practitioners in AI and robotics.
The paper tackles the problem of learning equilibria in multi-agent imitation learning (MAIL) by characterizing limits in non-interactive settings and introducing an interactive algorithm, MAIL-WARM, which improves sample complexity from O(ε^{-8}) to O(ε^{-2}), matching a lower bound, with numerical validation in environments like grid worlds.
We close open theoretical gaps in Multi-Agent Imitation Learning (MAIL) by characterizing the limits of non-interactive MAIL and presenting the first interactive algorithm with near-optimal sample complexity. In the non-interactive setting, we prove a statistical lower bound that identifies the all-policy deviation concentrability coefficient as the fundamental complexity measure, and we show that Behavior Cloning (BC) is rate-optimal. For the interactive setting, we introduce a framework that combines reward-free reinforcement learning with interactive MAIL and instantiate it with an algorithm, MAIL-WARM. It improves the best previously known sample complexity from $\mathcal{O}(\varepsilon^{-8})$ to $\mathcal{O}(\varepsilon^{-2}),$ matching the dependence on $\varepsilon$ implied by our lower bound. Finally, we provide numerical results that support our theory and illustrate, in environments such as grid worlds, where Behavior Cloning fails to learn.