MA AI LGNov 2, 2024

Role Play: Learning Adaptive Role-Specific Strategies in Multi-Agent Interactions

Weifan Long, Wen Wen, Peng Zhai, Lihua Zhang

arXiv:2411.01166v15.15 citationsh-index: 15Knowledge-Based Systems

Originality Highly original

AI Analysis

This addresses the challenge of agents adapting to unseen partners in multi-agent systems, with incremental improvements in robustness for applications like gaming or robotics.

The paper tackles the zero-shot coordination problem in multi-agent reinforcement learning by proposing the Role Play framework, which uses role embeddings to improve policy diversity and adaptability, achieving consistent performance gains over baselines in cooperative and mixed-motive games with unseen agents.

Zero-shot coordination problem in multi-agent reinforcement learning (MARL), which requires agents to adapt to unseen agents, has attracted increasing attention. Traditional approaches often rely on the Self-Play (SP) framework to generate a diverse set of policies in a policy pool, which serves to improve the generalization capability of the final agent. However, these frameworks may struggle to capture the full spectrum of potential strategies, especially in real-world scenarios that demand agents balance cooperation with competition. In such settings, agents need strategies that can adapt to varying and often conflicting goals. Drawing inspiration from Social Value Orientation (SVO)-where individuals maintain stable value orientations during interactions with others-we propose a novel framework called \emph{Role Play} (RP). RP employs role embeddings to transform the challenge of policy diversity into a more manageable diversity of roles. It trains a common policy with role embedding observations and employs a role predictor to estimate the joint role embeddings of other agents, helping the learning agent adapt to its assigned role. We theoretically prove that an approximate optimal policy can be achieved by optimizing the expected cumulative reward relative to an approximate role-based policy. Experimental results in both cooperative (Overcooked) and mixed-motive games (Harvest, CleanUp) reveal that RP consistently outperforms strong baselines when interacting with unseen agents, highlighting its robustness and adaptability in complex environments.

View on arXiv PDF

Similar