CORD: Generalizable Cooperation via Role Diversity
This addresses a critical issue for real-world deployment of multi-agent systems, though it appears incremental as it builds on existing hierarchical MARL methods.
The paper tackles the problem of overfitting in cooperative multi-agent reinforcement learning, where policies fail to generalize to unseen collaborators, by proposing CORD, a hierarchical approach that uses role diversity to improve generalization, achieving better performance than baselines in various tasks.
Cooperative multi-agent reinforcement learning (MARL) aims to develop agents that can collaborate effectively. However, most cooperative MARL methods overfit training agents, making learned policies not generalize well to unseen collaborators, which is a critical issue for real-world deployment. Some methods attempt to address the generalization problem but require prior knowledge or predefined policies of new teammates, limiting real-world applications. To this end, we propose a hierarchical MARL approach to enable generalizable cooperation via role diversity, namely CORD. CORD's high-level controller assigns roles to low-level agents by maximizing the role entropy with constraints. We show this constrained objective can be decomposed into causal influence in role that enables reasonable role assignment, and role heterogeneity that yields coherent, non-redundant role clusters. Evaluated on a variety of cooperative multi-agent tasks, CORD achieves better performance than baselines, especially in generalization tests. Ablation studies further demonstrate the efficacy of the constrained objective in generalizable cooperation.