GT LGDec 13, 2023

Learning in Zero-Sum Markov Games: Relaxing Strong Reachability and Mixing Time Assumptions

arXiv:2312.08008v32.33 citationsh-index: 5

Originality Incremental advance

AI Analysis

This work addresses the problem of learning in competitive multi-agent systems for researchers and practitioners, offering a more practical approach by reducing stringent assumptions, though it is incremental in improving upon prior methods.

The paper tackles decentralized learning in zero-sum Markov games by relaxing strong reachability and mixing time assumptions, achieving finite-time convergence to an approximate Nash equilibrium with a new algorithm based on Tsallis entropy regularization.

We address payoff-based decentralized learning in infinite-horizon zero-sum Markov games. In this setting, each player makes decisions based solely on received rewards, without observing the opponent's strategy or actions nor sharing information. Prior works established finite-time convergence to an approximate Nash equilibrium under strong reachability and mixing time assumptions. We propose a convergent algorithm that significantly relaxes these assumptions, requiring only the existence of a single policy (not necessarily known) with bounded reachability and mixing time. Our key technical novelty is introducing Tsallis entropy regularization to smooth the best-response policy updates. By suitably tuning this regularization, we ensure sufficient exploration, thus bypassing previous stringent assumptions on the MDP. By establishing novel properties of the value and policy updates induced by the Tsallis entropy regularizer, we prove finite-time convergence to an approximate Nash equilibrium.

View on arXiv PDF

Similar