AIMar 26, 2024

Self-Clustering Hierarchical Multi-Agent Reinforcement Learning with Extensible Cooperation Graph

Qingxu Fu, Tenghai Qiu, Jianqiang Yi, Zhiqiang Pu, Xiaolin Ai

arXiv:2403.18056v14.23 citationsh-index: 14IEEE Trans Emerg Top Comput Intell

Originality Incremental advance

AI Analysis

This addresses the need for interpretable and extensible hierarchical cooperation in multi-agent systems, offering a novel approach for domains like robotics or gaming, though it is incremental in building on existing MARL methods.

The paper tackles the problem of complex multi-agent cooperation in reinforcement learning by proposing a hierarchical model called HCGL, which uses a dynamic Extensible Cooperation Graph to guide agent behaviors, resulting in outstanding performance in sparse reward benchmarks and high zero-shot transfer success rates to large-scale scenarios.

Multi-Agent Reinforcement Learning (MARL) has been successful in solving many cooperative challenges. However, classic non-hierarchical MARL algorithms still cannot address various complex multi-agent problems that require hierarchical cooperative behaviors. The cooperative knowledge and policies learned in non-hierarchical algorithms are implicit and not interpretable, thereby restricting the integration of existing knowledge. This paper proposes a novel hierarchical MARL model called Hierarchical Cooperation Graph Learning (HCGL) for solving general multi-agent problems. HCGL has three components: a dynamic Extensible Cooperation Graph (ECG) for achieving self-clustering cooperation; a group of graph operators for adjusting the topology of ECG; and an MARL optimizer for training these graph operators. HCGL's key distinction from other MARL models is that the behaviors of agents are guided by the topology of ECG instead of policy neural networks. ECG is a three-layer graph consisting of an agent node layer, a cluster node layer, and a target node layer. To manipulate the ECG topology in response to changing environmental conditions, four graph operators are trained to adjust the edge connections of ECG dynamically. The hierarchical feature of ECG provides a unique approach to merge primitive actions (actions executed by the agents) and cooperative actions (actions executed by the clusters) into a unified action space, allowing us to integrate fundamental cooperative knowledge into an extensible interface. In our experiments, the HCGL model has shown outstanding performance in multi-agent benchmarks with sparse rewards. We also verify that HCGL can easily be transferred to large-scale scenarios with high zero-shot transfer success rates.

View on arXiv PDF

Similar