LG AIMay 25, 2022

Scalable Multi-Agent Model-Based Reinforcement Learning

arXiv:2205.15023v117.344 citationsh-index: 5Has Code

Originality Incremental advance

AI Analysis

This work addresses sample efficiency for researchers and practitioners in multi-agent systems, offering an incremental improvement over existing communication-based methods.

The paper tackles the problem of sample inefficiency in multi-agent reinforcement learning for cooperative environments by proposing MAMBA, a model-based method that uses communication to sustain world models and imaginary rollouts, reducing environment interactions by up to an order of magnitude compared to model-free approaches in SMAC and Flatland domains.

Recent Multi-Agent Reinforcement Learning (MARL) literature has been largely focused on Centralized Training with Decentralized Execution (CTDE) paradigm. CTDE has been a dominant approach for both cooperative and mixed environments due to its capability to efficiently train decentralized policies. While in mixed environments full autonomy of the agents can be a desirable outcome, cooperative environments allow agents to share information to facilitate coordination. Approaches that leverage this technique are usually referred as communication methods, as full autonomy of agents is compromised for better performance. Although communication approaches have shown impressive results, they do not fully leverage this additional information during training phase. In this paper, we propose a new method called MAMBA which utilizes Model-Based Reinforcement Learning (MBRL) to further leverage centralized training in cooperative environments. We argue that communication between agents is enough to sustain a world model for each agent during execution phase while imaginary rollouts can be used for training, removing the necessity to interact with the environment. These properties yield sample efficient algorithm that can scale gracefully with the number of agents. We empirically confirm that MAMBA achieves good performance while reducing the number of interactions with the environment up to an orders of magnitude compared to Model-Free state-of-the-art approaches in challenging domains of SMAC and Flatland.

View on arXiv PDF Code

Similar