GT LG OC MLAug 27, 2024

Exploiting Approximate Symmetry for Efficient Multi-Agent Reinforcement Learning

arXiv:2408.15173v13.37 citationsh-index: 5

Originality Incremental advance

AI Analysis

This work addresses the problem of scaling multi-agent reinforcement learning to large, heterogeneous systems for researchers and practitioners, though it is incremental by building on existing MFG frameworks.

The paper tackles the limitation of mean-field games (MFGs) in multi-agent reinforcement learning by extending them to handle approximate symmetry and heterogeneity in real-world scenarios, resulting in a method that learns approximate Nash equilibria with finite-sample guarantees and a sample complexity of O(ε^{-6}) for certain games, validated on benchmarks with thousands of agents.

Mean-field games (MFG) have become significant tools for solving large-scale multi-agent reinforcement learning problems under symmetry. However, the assumption of exact symmetry limits the applicability of MFGs, as real-world scenarios often feature inherent heterogeneity. Furthermore, most works on MFG assume access to a known MFG model, which might not be readily available for real-world finite-agent games. In this work, we broaden the applicability of MFGs by providing a methodology to extend any finite-player, possibly asymmetric, game to an "induced MFG". First, we prove that $N$-player dynamic games can be symmetrized and smoothly extended to the infinite-player continuum via explicit Kirszbraun extensions. Next, we propose the notion of $α,β$-symmetric games, a new class of dynamic population games that incorporate approximate permutation invariance. For $α,β$-symmetric games, we establish explicit approximation bounds, demonstrating that a Nash policy of the induced MFG is an approximate Nash of the $N$-player dynamic game. We show that TD learning converges up to a small bias using trajectories of the $N$-player game with finite-sample guarantees, permitting symmetrized learning without building an explicit MFG model. Finally, for certain games satisfying monotonicity, we prove a sample complexity of $\widetilde{\mathcal{O}}(\varepsilon^{-6})$ for the $N$-agent game to learn an $\varepsilon$-Nash up to symmetrization bias. Our theory is supported by evaluations on MARL benchmarks with thousands of agents.

View on arXiv PDF

Similar