LG AI GT OCFeb 13, 2025

Incentivize without Bonus: Provably Efficient Model-based Online Multi-agent RL for Markov Games

arXiv:2502.09780v19.44 citationsh-index: 6ICML

Originality Highly original

AI Analysis

This work addresses the problem of efficient multi-agent learning for applications involving interacting agents in unknown environments, providing an incremental solution for online Markov games.

The authors tackled the problem of sample-efficient multi-agent reinforcement learning in Markov games, achieving near-optimal regret for finding Nash equilibria and coarse correlated equilibria. Their algorithm, VMG, achieves this in an online environment with linear function approximation.

Multi-agent reinforcement learning (MARL) lies at the heart of a plethora of applications involving the interaction of a group of agents in a shared unknown environment. A prominent framework for studying MARL is Markov games, with the goal of finding various notions of equilibria in a sample-efficient manner, such as the Nash equilibrium (NE) and the coarse correlated equilibrium (CCE). However, existing sample-efficient approaches either require tailored uncertainty estimation under function approximation, or careful coordination of the players. In this paper, we propose a novel model-based algorithm, called VMG, that incentivizes exploration via biasing the empirical estimate of the model parameters towards those with a higher collective best-response values of all the players when fixing the other players' policies, thus encouraging the policy to deviate from its current equilibrium for more exploration. VMG is oblivious to different forms of function approximation, and permits simultaneous and uncoupled policy updates of all players. Theoretically, we also establish that VMG achieves a near-optimal regret for finding both the NEs of two-player zero-sum Markov games and CCEs of multi-player general-sum Markov games under linear function approximation in an online environment, which nearly match their counterparts with sophisticated uncertainty quantification.

View on arXiv PDF

Similar