MAAIGTLGSep 26, 2018

Learning through Probing: a decentralized reinforcement learning architecture for social dilemmas

arXiv:1809.10007v24 citations
Originality Incremental advance
AI Analysis

This work addresses cooperation challenges in multi-agent systems, particularly in social dilemmas like the Prisoner's Dilemma, with incremental improvements over existing reinforcement learning approaches.

The authors tackled the problem of fostering cooperation in multi-agent social dilemmas by introducing Learning through Probing (LTP), a decentralized reinforcement learning architecture that uses a probing mechanism to account for changes in opponent behavior, resulting in higher average cumulative rewards in the Iterated Prisoner's Dilemma compared to other methods.

Multi-agent reinforcement learning has received significant interest in recent years notably due to the advancements made in deep reinforcement learning which have allowed for the developments of new architectures and learning algorithms. Using social dilemmas as the training ground, we present a novel learning architecture, Learning through Probing (LTP), where agents utilize a probing mechanism to incorporate how their opponent's behavior changes when an agent takes an action. We use distinct training phases and adjust rewards according to the overall outcome of the experiences accounting for changes to the opponents behavior. We introduce a parameter eta to determine the significance of these future changes to opponent behavior. When applied to the Iterated Prisoner's Dilemma (IPD), LTP agents demonstrate that they can learn to cooperate with each other, achieving higher average cumulative rewards than other reinforcement learning methods while also maintaining good performance in playing against static agents that are present in Axelrod tournaments. We compare this method with traditional reinforcement learning algorithms and agent-tracking techniques to highlight key differences and potential applications. We also draw attention to the differences between solving games and societal-like interactions and analyze the training of Q-learning agents in makeshift societies. This is to emphasize how cooperation may emerge in societies and demonstrate this using environments where interactions with opponents are determined through a random encounter format of the IPD.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes