LGAIGTMAMLFeb 7, 2023

Breaking the Curse of Multiagents in a Large State Space: RL in Markov Games with Independent Linear Function Approximation

arXiv:2302.03673v333 citationsh-index: 49
Originality Highly original
AI Analysis

This work addresses the scalability issue in multi-agent reinforcement learning for applications with many agents, offering a significant improvement over existing methods that suffer from exponential complexity.

The paper tackles the problem of multi-agent reinforcement learning in large state spaces with many agents by proposing a new model, independent linear Markov game, and designing algorithms to learn Markov coarse correlated equilibria and correlated equilibria with sample complexity that scales polynomially with each agent's function class complexity, breaking the curse of multiagents. The result includes sample complexity bounds, such as an algorithm with $\widetilde{O}(\epsilon^{-2})$ sample complexity for Markov CCE in the tabular case, improving the state-of-the-art $\widetilde{O}(\epsilon^{-3})$.

We propose a new model, independent linear Markov game, for multi-agent reinforcement learning with a large state space and a large number of agents. This is a class of Markov games with independent linear function approximation, where each agent has its own function approximation for the state-action value functions that are marginalized by other players' policies. We design new algorithms for learning the Markov coarse correlated equilibria (CCE) and Markov correlated equilibria (CE) with sample complexity bounds that only scale polynomially with each agent's own function class complexity, thus breaking the curse of multiagents. In contrast, existing works for Markov games with function approximation have sample complexity bounds scale with the size of the \emph{joint action space} when specialized to the canonical tabular Markov game setting, which is exponentially large in the number of agents. Our algorithms rely on two key technical innovations: (1) utilizing policy replay to tackle non-stationarity incurred by multiple agents and the use of function approximation; (2) separating learning Markov equilibria and exploration in the Markov games, which allows us to use the full-information no-regret learning oracle instead of the stronger bandit-feedback no-regret learning oracle used in the tabular setting. Furthermore, we propose an iterative-best-response type algorithm that can learn pure Markov Nash equilibria in independent linear Markov potential games. In the tabular case, by adapting the policy replay mechanism for independent linear Markov games, we propose an algorithm with $\widetilde{O}(ε^{-2})$ sample complexity to learn Markov CCE, which improves the state-of-the-art result $\widetilde{O}(ε^{-3})$ in Daskalakis et al. 2022, where $ε$ is the desired accuracy, and also significantly improves other problem parameters.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes