AILGMAFeb 28, 2017

Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning

arXiv:1702.08887v3665 citations
Originality Incremental advance
AI Analysis

This addresses a key bottleneck for scaling deep RL to multi-agent systems like network routing and traffic control, though it is incremental as it builds on existing independent Q-learning.

The paper tackles the problem of nonstationarity in multi-agent reinforcement learning (RL) when using experience replay, proposing two methods: multi-agent importance sampling to decay obsolete data and conditioning value functions on data age fingerprints. Results on a decentralized StarCraft micromanagement task show these methods successfully combine experience replay with multi-agent RL.

Many real-world problems, such as network packet routing and urban traffic control, are naturally modeled as multi-agent reinforcement learning (RL) problems. However, existing multi-agent RL methods typically scale poorly in the problem size. Therefore, a key challenge is to translate the success of deep learning on single-agent RL to the multi-agent setting. A major stumbling block is that independent Q-learning, the most popular multi-agent RL method, introduces nonstationarity that makes it incompatible with the experience replay memory on which deep Q-learning relies. This paper proposes two methods that address this problem: 1) using a multi-agent variant of importance sampling to naturally decay obsolete data and 2) conditioning each agent's value function on a fingerprint that disambiguates the age of the data sampled from the replay memory. Results on a challenging decentralised variant of StarCraft unit micromanagement confirm that these methods enable the successful combination of experience replay with multi-agent RL.

Code Implementations5 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes