LGAIMAMLMay 27, 2020

Revisiting Parameter Sharing in Multi-Agent Deep Reinforcement Learning

arXiv:2005.13625v831 citations
Originality Incremental advance
AI Analysis

This work addresses a foundational problem in multi-agent systems for researchers and practitioners, offering incremental improvements by extending existing techniques to more complex environments.

The paper tackled the limitation of parameter sharing in multi-agent deep reinforcement learning, where shared parameters prevent agents from learning different policies, by formalizing and extending agent indication methods to heterogeneous spaces, proving convergence to optimal policies and experimentally validating these methods with image-based observations.

Parameter sharing, where each agent independently learns a policy with fully shared parameters between all policies, is a popular baseline method for multi-agent deep reinforcement learning. Unfortunately, since all agents share the same policy network, they cannot learn different policies or tasks. This issue has been circumvented experimentally by adding an agent-specific indicator signal to observations, which we term "agent indication". Agent indication is limited, however, in that without modification it does not allow parameter sharing to be applied to environments where the action spaces and/or observation spaces are heterogeneous. This work formalizes the notion of agent indication and proves that it enables convergence to optimal policies for the first time. Next, we formally introduce methods to extend parameter sharing to learning in heterogeneous observation and action spaces, and prove that these methods allow for convergence to optimal policies. Finally, we experimentally confirm that the methods we introduce function empirically, and conduct a wide array of experiments studying the empirical efficacy of many different agent indication schemes for image based observation spaces.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes