LGAIMAJun 17, 2021

Many Agent Reinforcement Learning Under Partial Observability

arXiv:2106.09825v1
Originality Incremental advance
AI Analysis

This work addresses scalability issues for researchers and practitioners in MARL, but it is incremental as it builds on existing methods.

The paper tackles the scalability challenge in multi-agent reinforcement learning (MARL) under partial observability by applying action anonymity to existing algorithms like MADDPG and IA2C, showing they can learn optimal behavior in a broader class of agent networks than mean-field MARL.

Recent renewed interest in multi-agent reinforcement learning (MARL) has generated an impressive array of techniques that leverage deep reinforcement learning, primarily actor-critic architectures, and can be applied to a limited range of settings in terms of observability and communication. However, a continuing limitation of much of this work is the curse of dimensionality when it comes to representations based on joint actions, which grow exponentially with the number of agents. In this paper, we squarely focus on this challenge of scalability. We apply the key insight of action anonymity, which leads to permutation invariance of joint actions, to two recently presented deep MARL algorithms, MADDPG and IA2C, and compare these instantiations to another recent technique that leverages action anonymity, viz., mean-field MARL. We show that our instantiations can learn the optimal behavior in a broader class of agent networks than the mean-field method, using a recently introduced pragmatic domain.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes