LGAIGTMAMLJun 12, 2023

A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning

arXiv:2306.07465v26 citationsh-index: 49
Originality Incremental advance
AI Analysis

This addresses the challenge of adapting multi-agent learning to dynamic environments, which is incremental but provides theoretical guarantees for a broad class of games.

The paper tackles the problem of learning equilibria in non-stationary multi-agent reinforcement learning systems with bandit feedback, achieving regret bounds of Õ(Δ^{1/4}T^{3/4}) when the non-stationarity degree Δ is known and Õ(Δ^{1/5}T^{4/5}) when Δ is unknown over T rounds.

We investigate learning the equilibria in non-stationary multi-agent systems and address the challenges that differentiate multi-agent learning from single-agent learning. Specifically, we focus on games with bandit feedback, where testing an equilibrium can result in substantial regret even when the gap to be tested is small, and the existence of multiple optimal solutions (equilibria) in stationary games poses extra challenges. To overcome these obstacles, we propose a versatile black-box approach applicable to a broad spectrum of problems, such as general-sum games, potential games, and Markov games, when equipped with appropriate learning and testing oracles for stationary environments. Our algorithms can achieve $\widetilde{O}\left(Δ^{1/4}T^{3/4}\right)$ regret when the degree of nonstationarity, as measured by total variation $Δ$, is known, and $\widetilde{O}\left(Δ^{1/5}T^{4/5}\right)$ regret when $Δ$ is unknown, where $T$ is the number of rounds. Meanwhile, our algorithm inherits the favorable dependence on number of agents from the oracles. As a side contribution that may be independent of interest, we show how to test for various types of equilibria by a black-box reduction to single-agent learning, which includes Nash equilibria, correlated equilibria, and coarse correlated equilibria.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes