MA LGMay 23, 2022

Learning to Advise and Learning from Advice in Cooperative Multi-Agent Reinforcement Learning

Yue Jin, Shuangqing Wei, Jian Yuan, Xudong Zhang

arXiv:2205.11163v11.2h-index: 6

Originality Incremental advance

AI Analysis

This work addresses coordination challenges in multi-agent systems, offering a new perspective for analyzing and improving MARL algorithms, though it appears incremental in building on existing concepts like hierarchy and adversarial learning.

The paper tackles the problem of coordination in multi-agent reinforcement learning by proposing a novel approach called Learning to Advise and Learning from Advice (LALA), which uses an advisor and policy discriminator to enhance decision-making at different levels, resulting in improved learning efficiency and coordination capability over baseline methods.

Learning to coordinate is a daunting problem in multi-agent reinforcement learning (MARL). Previous works have explored it from many facets, including cognition between agents, credit assignment, communication, expert demonstration, etc. However, less attention were paid to agents' decision structure and the hierarchy of coordination. In this paper, we explore the spatiotemporal structure of agents' decisions and consider the hierarchy of coordination from the perspective of multilevel emergence dynamics, based on which a novel approach, Learning to Advise and Learning from Advice (LALA), is proposed to improve MARL. Specifically, by distinguishing the hierarchy of coordination, we propose to enhance decision coordination at meso level with an advisor and leverage a policy discriminator to advise agents' learning at micro level. The advisor learns to aggregate decision information in both spatial and temporal domains and generates coordinated decisions by employing a spatiotemporal dual graph convolutional neural network with a task-oriented objective function. Each agent learns from the advice via a policy generative adversarial learning method where a discriminator distinguishes between the policies of the agent and the advisor and boosts both of them based on its judgement. Experimental results indicate the advantage of LALA over baseline approaches in terms of both learning efficiency and coordination capability. Coordination mechanism is investigated from the perspective of multilevel emergence dynamics and mutual information point of view, which provides a novel perspective and method to analyze and improve MARL algorithms.

View on arXiv PDF

Similar