GTLGMar 3, 2023

A Finite-Sample Analysis of Payoff-Based Independent Learning in Zero-Sum Stochastic Games

arXiv:2303.03100v117 citationsh-index: 68
Originality Incremental advance
AI Analysis

This addresses the challenge of efficient learning in multi-agent stochastic games, providing theoretical guarantees for independent learning dynamics, though it is incremental as it builds on existing methods like TD-learning and best-response dynamics.

The paper tackles the problem of learning in two-player zero-sum stochastic games by proposing Doubly Smoothed Best-Response dynamics, which integrates smoothed best-response into TD-learning and minimax value iteration, achieving finite-sample guarantees with an O~(1/ε^2) sample complexity bound for payoff-based independent learning dynamics, and a sharper O~(1/ε) bound for matrix games.

We study two-player zero-sum stochastic games, and propose a form of independent learning dynamics called Doubly Smoothed Best-Response dynamics, which integrates a discrete and doubly smoothed variant of the best-response dynamics into temporal-difference (TD)-learning and minimax value iteration. The resulting dynamics are payoff-based, convergent, rational, and symmetric among players. Our main results provide finite-sample guarantees. In particular, we prove the first-known $\tilde{\mathcal{O}}(1/ε^2)$ sample complexity bound for payoff-based independent learning dynamics, up to a smoothing bias. In the special case where the stochastic game has only one state (i.e., matrix games), we provide a sharper $\tilde{\mathcal{O}}(1/ε)$ sample complexity. Our analysis uses a novel coupled Lyapunov drift approach to capture the evolution of multiple sets of coupled and stochastic iterates, which might be of independent interest.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes