Multi-Agent Reinforcement Learning in Cournot Games
This provides a theoretical guarantee for multi-agent reinforcement learning in economic models, though it is incremental as it builds on existing policy gradient methods.
The paper tackles the problem of strategic agents learning in continuous action Cournot games with limited information, proving that policy gradient dynamics converge to Nash equilibrium under linear price functions or with two agents.
In this work, we study the interaction of strategic agents in continuous action Cournot games with limited information feedback. Cournot game is the essential market model for many socio-economic systems where agents learn and compete without the full knowledge of the system or each other. We consider the dynamics of the policy gradient algorithm, which is a widely adopted continuous control reinforcement learning algorithm, in concave Cournot games. We prove the convergence of policy gradient dynamics to the Nash equilibrium when the price function is linear or the number of agents is two. This is the first result (to the best of our knowledge) on the convergence property of learning algorithms with continuous action spaces that do not fall in the no-regret class.