LGJun 10, 2021

Learning to Play General-Sum Games Against Multiple Boundedly Rational Agents

Eric Zhao, Alexander R. Trott, Caiming Xiong, Stephan Zheng

arXiv:2106.05492v33.11 citationsHas Code

Originality Highly original

AI Analysis

This work addresses automated mechanism design for robust policy learning against multiple agents, with incremental extensions to boundedly rational settings.

The paper tackles the problem of training a robust principal policy in multi-agent general-sum games by using no-regret dynamics to efficiently identify worst-case strategic responses, and demonstrates a 15% welfare improvement in a simulated economy with boundedly rational agents.

We study the problem of training a principal in a multi-agent general-sum game using reinforcement learning (RL). Learning a robust principal policy requires anticipating the worst possible strategic responses of other agents, which is generally NP-hard. However, we show that no-regret dynamics can identify these worst-case responses in poly-time in smooth games. We propose a framework that uses this policy evaluation method for efficiently learning a robust principal policy using RL. This framework can be extended to provide robustness to boundedly rational agents too. Our motivating application is automated mechanism design: we empirically demonstrate our framework learns robust mechanisms in both matrix games and complex spatiotemporal games. In particular, we learn a dynamic tax policy that improves the welfare of a simulated trade-and-barter economy by 15%, even when facing previously unseen boundedly rational RL taxpayers.

View on arXiv PDF Code

Similar