AIMAMay 24, 2017

Counterfactual Multi-Agent Policy Gradients

arXiv:1705.08926v32570 citations
Originality Incremental advance
AI Analysis

This addresses the need for efficient reinforcement learning methods in domains like network routing and autonomous vehicles, representing an incremental advance in multi-agent credit assignment.

The paper tackles the problem of learning decentralized policies in cooperative multi-agent systems by proposing COMA, a multi-agent actor-critic method with a counterfactual baseline for credit assignment, which significantly improves average performance over other methods in StarCraft unit micromanagement and achieves competitive results with centralized controllers.

Cooperative multi-agent systems can be naturally used to model many real world problems, such as network packet routing and the coordination of autonomous vehicles. There is a great need for new reinforcement learning methods that can efficiently learn decentralised policies for such systems. To this end, we propose a new multi-agent actor-critic method called counterfactual multi-agent (COMA) policy gradients. COMA uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies. In addition, to address the challenges of multi-agent credit assignment, it uses a counterfactual baseline that marginalises out a single agent's action, while keeping the other agents' actions fixed. COMA also uses a critic representation that allows the counterfactual baseline to be computed efficiently in a single forward pass. We evaluate COMA in the testbed of StarCraft unit micromanagement, using a decentralised variant with significant partial observability. COMA significantly improves average performance over other multi-agent actor-critic methods in this setting, and the best performing agents are competitive with state-of-the-art centralised controllers that get access to the full state.

Code Implementations7 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes