GTLGOCSep 10, 2018

Multi-agent online learning in time-varying games

arXiv:1809.03066v345 citations
AI Analysis

This work addresses the challenge of dynamic game environments for multi-agent systems, representing an incremental advancement in online learning theory.

The paper tackles the problem of multi-agent online learning in time-varying games by analyzing mirror descent policies, showing convergence to Nash equilibrium in stabilizing games and asymptotic closeness to evolving equilibria in strongly monotone games, with results applicable to both gradient-based and bandit feedback.

We examine the long-run behavior of multi-agent online learning in games that evolve over time. Specifically, we focus on a wide class of policies based on mirror descent, and we show that the induced sequence of play (a) converges to Nash equilibrium in time-varying games that stabilize in the long run to a strictly monotone limit; and (b) it stays asymptotically close to the evolving equilibrium of the sequence of stage games (assuming they are strongly monotone). Our results apply to both gradient-based and payoff-based feedback - i.e., the "bandit feedback" case where players only get to observe the payoffs of their chosen actions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes