Provable Fictitious Play for General Mean-Field Games
This provides a provably convergent single-loop algorithm for mean-field games, addressing a gap in the literature for iterative updates of both components, which is incremental but improves efficiency over prior methods.
The paper tackles the problem of learning Nash equilibria in stationary mean-field games by proposing a reinforcement learning algorithm that alternatively updates the mean-field state and policy using gradient-descent and proximal policy optimization, and proves it converges at a sublinear rate.
We propose a reinforcement learning algorithm for stationary mean-field games, where the goal is to learn a pair of mean-field state and stationary policy that constitutes the Nash equilibrium. When viewing the mean-field state and the policy as two players, we propose a fictitious play algorithm which alternatively updates the mean-field state and the policy via gradient-descent and proximal policy optimization, respectively. Our algorithm is in stark contrast with previous literature which solves each single-agent reinforcement learning problem induced by the iterates mean-field states to the optimum. Furthermore, we prove that our fictitious play algorithm converges to the Nash equilibrium at a sublinear rate. To the best of our knowledge, this seems the first provably convergent single-loop reinforcement learning algorithm for mean-field games based on iterative updates of both mean-field state and policy.