LGAIDec 23, 2021

Local Advantage Networks for Cooperative Multi-Agent Reinforcement Learning

arXiv:2112.12458v31 citations
Originality Highly original
AI Analysis

This addresses the challenge of scalable and efficient learning for cooperative multi-agent systems, offering a promising alternative to existing methods.

The paper tackles the problem of cooperative multi-agent reinforcement learning in partially observable environments by introducing Local Advantage Networks (LAN), which learn decentralized best-response policies for each agent using individual advantage functions stabilized by a centralized critic. The result is state-of-the-art performance on the StarCraft II benchmark, with high scalability to many agents.

Many recent successful off-policy multi-agent reinforcement learning (MARL) algorithms for cooperative partially observable environments focus on finding factorized value functions, leading to convoluted network structures. Building on the structure of independent Q-learners, our LAN algorithm takes a radically different approach, leveraging a dueling architecture to learn for each agent a decentralized best-response policies via individual advantage functions. The learning is stabilized by a centralized critic whose primary objective is to reduce the moving target problem of the individual advantages. The critic, whose network's size is independent of the number of agents, is cast aside after learning. Evaluation on the StarCraft II multi-agent challenge benchmark shows that LAN reaches state-of-the-art performance and is highly scalable with respect to the number of agents, opening up a promising alternative direction for MARL research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes