MAAIGTLGFeb 10, 2017

Multi-agent Reinforcement Learning in Sequential Social Dilemmas

arXiv:1702.03037v1697 citations
Originality Incremental advance
AI Analysis

This work addresses the gap in multi-agent reinforcement learning for sequential social dilemmas, providing insights into cooperation dynamics, though it is incremental by extending matrix game concepts to temporal domains.

The paper tackles the problem of modeling real-world social dilemmas as sequential rather than atomic decisions by introducing two Markov games (Gathering and Wolfpack) and analyzing policies learned by multiple independent deep Q-network agents. The results show how environmental factors like resource abundance affect cooperation and conflict emergence in these sequential settings.

Matrix games like Prisoner's Dilemma have guided research on social dilemmas for decades. However, they necessarily treat the choice to cooperate or defect as an atomic action. In real-world social dilemmas these choices are temporally extended. Cooperativeness is a property that applies to policies, not elementary actions. We introduce sequential social dilemmas that share the mixed incentive structure of matrix game social dilemmas but also require agents to learn policies that implement their strategic intentions. We analyze the dynamics of policies learned by multiple self-interested independent learning agents, each using its own deep Q-network, on two Markov games we introduce here: 1. a fruit Gathering game and 2. a Wolfpack hunting game. We characterize how learned behavior in each domain changes as a function of environmental factors including resource abundance. Our experiments show how conflict can emerge from competition over shared resources and shed light on how the sequential nature of real world social dilemmas affects cooperation.

Code Implementations4 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes