GTLGDSOct 8, 2020

Fictitious play in zero-sum stochastic games

arXiv:2010.04223v659 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of equilibrium computation in stochastic games for researchers in game theory and reinforcement learning, though it appears incremental as it builds on existing fictitious play and Q-learning methods.

The authors tackled the problem of learning in two-player zero-sum stochastic games by proposing a novel variant of fictitious play dynamics combined with Q-learning, and they showed that the beliefs on strategies converge to a stationary mixed Nash equilibrium in both model-based and model-free cases.

We present a novel variant of fictitious play dynamics combining classical fictitious play with Q-learning for stochastic games and analyze its convergence properties in two-player zero-sum stochastic games. Our dynamics involves players forming beliefs on the opponent strategy and their own continuation payoff (Q-function), and playing a greedy best response by using the estimated continuation payoffs. Players update their beliefs from observations of opponent actions. A key property of the learning dynamics is that update of the beliefs on Q-functions occurs at a slower timescale than update of the beliefs on strategies. We show both in the model-based and model-free cases (without knowledge of player payoff functions and state transition probabilities), the beliefs on strategies converge to a stationary mixed Nash equilibrium of the zero-sum stochastic game.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes