LGGNAOSOC-PHSep 1, 2022

Intrinsic fluctuations of reinforcement learning promote cooperation

arXiv:2209.01013v228 citationsh-index: 15
AI Analysis

This addresses the problem of designing cooperative algorithms for multi-agent systems, with implications for regulating collusive effects, though it is incremental in nature.

The study investigated how intrinsic stochastic fluctuations in temporal-difference reinforcement learning with epsilon-greedy strategies promote cooperation in the iterated Prisoner's dilemma, finding that these fluctuations double the final cooperation rate to up to 80%.

In this work, we ask for and answer what makes classical temporal-difference reinforcement learning with epsilon-greedy strategies cooperative. Cooperating in social dilemma situations is vital for animals, humans, and machines. While evolutionary theory revealed a range of mechanisms promoting cooperation, the conditions under which agents learn to cooperate are contested. Here, we demonstrate which and how individual elements of the multi-agent learning setting lead to cooperation. We use the iterated Prisoner's dilemma with one-period memory as a testbed. Each of the two learning agents learns a strategy that conditions the following action choices on both agents' action choices of the last round. We find that next to a high caring for future rewards, a low exploration rate, and a small learning rate, it is primarily intrinsic stochastic fluctuations of the reinforcement learning process which double the final rate of cooperation to up to 80%. Thus, inherent noise is not a necessary evil of the iterative learning process. It is a critical asset for the learning of cooperation. However, we also point out the trade-off between a high likelihood of cooperative behavior and achieving this in a reasonable amount of time. Our findings are relevant for purposefully designing cooperative algorithms and regulating undesired collusive effects.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes