LG AI MAJun 17, 2021

Many Agent Reinforcement Learning Under Partial Observability

Keyang He, Prashant Doshi, Bikramjit Banerjee

arXiv:2106.09825v11.6

Originality Incremental advance

AI Analysis

This work addresses scalability issues for researchers and practitioners in MARL, but it is incremental as it builds on existing methods.

The paper tackles the scalability challenge in multi-agent reinforcement learning (MARL) under partial observability by applying action anonymity to existing algorithms like MADDPG and IA2C, showing they can learn optimal behavior in a broader class of agent networks than mean-field MARL.

Recent renewed interest in multi-agent reinforcement learning (MARL) has generated an impressive array of techniques that leverage deep reinforcement learning, primarily actor-critic architectures, and can be applied to a limited range of settings in terms of observability and communication. However, a continuing limitation of much of this work is the curse of dimensionality when it comes to representations based on joint actions, which grow exponentially with the number of agents. In this paper, we squarely focus on this challenge of scalability. We apply the key insight of action anonymity, which leads to permutation invariance of joint actions, to two recently presented deep MARL algorithms, MADDPG and IA2C, and compare these instantiations to another recent technique that leverages action anonymity, viz., mean-field MARL. We show that our instantiations can learn the optimal behavior in a broader class of agent networks than the mean-field method, using a recently introduced pragmatic domain.

View on arXiv PDF

Similar