LG AIFeb 10, 2021

Modeling the Interaction between Agents in Cooperative Multi-Agent Reinforcement Learning

Xiaoteng Ma, Yiqin Yang, Chenghao Li, Yiwen Lu, Qianchuan Zhao, Yang Jun

arXiv:2102.06042v18.419 citations

Originality Incremental advance

AI Analysis

This addresses the need for improved teamwork efficiency in cooperative multi-agent systems, such as games or real-life applications, with incremental advancements in modeling interactions.

The paper tackles the problem of insufficient attention to agent interactions in cooperative multi-agent reinforcement learning, which limits collaborative exploration and value function estimation, and proposes the IAC algorithm that models interactions from policy and value perspectives, achieving better performance than state-of-the-art approaches on benchmark tasks.

Value-based methods of multi-agent reinforcement learning (MARL), especially the value decomposition methods, have been demonstrated on a range of challenging cooperative tasks. However, current methods pay little attention to the interaction between agents, which is essential to teamwork in games or real life. This limits the efficiency of value-based MARL algorithms in the two aspects: collaborative exploration and value function estimation. In this paper, we propose a novel cooperative MARL algorithm named as interactive actor-critic~(IAC), which models the interaction of agents from the perspectives of policy and value function. On the policy side, a multi-agent joint stochastic policy is introduced by adopting a collaborative exploration module, which is trained by maximizing the entropy-regularized expected return. On the value side, we use the shared attention mechanism to estimate the value function of each agent, which takes the impact of the teammates into consideration. At the implementation level, we extend the value decomposition methods to continuous control tasks and evaluate IAC on benchmark tasks including classic control and multi-agent particle environments. Experimental results indicate that our method outperforms the state-of-the-art approaches and achieves better performance in terms of cooperation.

View on arXiv PDF

Similar