AI MAMay 29, 2025

ROTATE: Regret-driven Open-ended Training for Ad Hoc Teamwork

Caroline Wang, Arrasy Rahman, Jiaxun Cui, Yoonchang Sung, Peter Stone

arXiv:2505.23686v211.14 citationsh-index: 11Has Code

Originality Highly original

AI Analysis

This addresses the problem of robust and generalizable teamwork in multi-agent systems for AI researchers, representing a novel integration rather than an incremental improvement.

The paper tackles the challenge of learning to collaborate with unseen partners in Ad Hoc Teamwork by introducing ROTATE, a unified framework that reformulates the problem as an open-ended learning process between an agent and an adversarial teammate generator, resulting in significant outperformance over baselines in generalizing to unseen teammates across diverse environments.

Learning to collaborate with previously unseen partners is a fundamental generalization challenge in multi-agent learning, known as Ad Hoc Teamwork (AHT). Existing AHT approaches often adopt a two-stage pipeline, where first, a fixed population of teammates is generated with the idea that they should be representative of the teammates that will be seen at deployment time, and second, an AHT agent is trained to collaborate well with agents in the population. To date, the research community has focused on designing separate algorithms for each stage. This separation has led to algorithms that generate teammates with limited coverage of possible behaviors, and that ignore whether the generated teammates are easy to learn from for the AHT agent. Furthermore, algorithms for training AHT agents typically treat the set of training teammates as static, thus attempting to generalize to previously unseen partner agents without assuming any control over the set of training teammates. This paper presents a unified framework for AHT by reformulating the problem as an open-ended learning process between an AHT agent and an adversarial teammate generator. We introduce ROTATE, a regret-driven, open-ended training algorithm that alternates between improving the AHT agent and generating teammates that probe its deficiencies. Experiments across diverse two-player environments demonstrate that ROTATE significantly outperforms baselines at generalizing to an unseen set of evaluation teammates, thus establishing a new standard for robust and generalizable teamwork.

View on arXiv PDF Code

Similar