LG AI MAApr 17, 2024

Group-Aware Coordination Graph for Multi-Agent Reinforcement Learning

arXiv:2404.10976v320.338 citationsh-index: 20Has CodeIJCAI

Originality Incremental advance

AI Analysis

This work addresses the challenge of improving collaboration in multi-agent systems for applications like gaming or robotics, though it appears incremental by building on existing graph-based methods.

The paper tackled the problem of learning higher-order relationships in multi-agent reinforcement learning by proposing a Group-Aware Coordination Graph (GACG) that captures both agent-pair and group-level dependencies, resulting in superior performance on StarCraft II micromanagement tasks as demonstrated in evaluations and ablation studies.

Cooperative Multi-Agent Reinforcement Learning (MARL) necessitates seamless collaboration among agents, often represented by an underlying relation graph. Existing methods for learning this graph primarily focus on agent-pair relations, neglecting higher-order relationships. While several approaches attempt to extend cooperation modelling to encompass behaviour similarities within groups, they commonly fall short in concurrently learning the latent graph, thereby constraining the information exchange among partially observed agents. To overcome these limitations, we present a novel approach to infer the Group-Aware Coordination Graph (GACG), which is designed to capture both the cooperation between agent pairs based on current observations and group-level dependencies from behaviour patterns observed across trajectories. This graph is further used in graph convolution for information exchange between agents during decision-making. To further ensure behavioural consistency among agents within the same group, we introduce a group distance loss, which promotes group cohesion and encourages specialization between groups. Our evaluations, conducted on StarCraft II micromanagement tasks, demonstrate GACG's superior performance. An ablation study further provides experimental evidence of the effectiveness of each component of our method.

View on arXiv PDF Code

Similar