AI GT LGDec 4, 2020

Learning in two-player games between transparent opponents

arXiv:2012.02671v24.16 citationsHas Code

Originality Incremental advance

AI Analysis

This research is significant for developers of multi-agent reinforcement learning systems, as it highlights the complex dynamics and potential pitfalls of transparency and opponent-aware learning in social dilemmas, particularly those with equilibrium selection problems.

This paper explores two reinforcement learning agents playing matrix games with transparent decision-making, where each agent can predict and influence the other's gradient steps. They found that this setup robustly leads to mutual cooperation in a single-shot prisoner's dilemma, but struggles with convergence to mutually beneficial outcomes in a game of chicken, sometimes leading to worst-case scenarios.

We consider a scenario in which two reinforcement learning agents repeatedly play a matrix game against each other and update their parameters after each round. The agents' decision-making is transparent to each other, which allows each agent to predict how their opponent will play against them. To prevent an infinite regress of both agents recursively predicting each other indefinitely, each agent is required to give an opponent-independent response with some probability at least epsilon. Transparency also allows each agent to anticipate and shape the other agent's gradient step, i.e. to move to regions of parameter space in which the opponent's gradient points in a direction favourable to them. We study the resulting dynamics experimentally, using two algorithms from previous literature (LOLA and SOS) for opponent-aware learning. We find that the combination of mutually transparent decision-making and opponent-aware learning robustly leads to mutual cooperation in a single-shot prisoner's dilemma. In a game of chicken, in which both agents try to manoeuvre their opponent towards their preferred equilibrium, converging to a mutually beneficial outcome turns out to be much harder, and opponent-aware learning can even lead to worst-case outcomes for both agents. This highlights the need to develop opponent-aware learning algorithms that achieve acceptable outcomes in social dilemmas involving an equilibrium selection problem.

View on arXiv PDF Code

Similar