AIApr 16, 2024

N-Agent Ad Hoc Teamwork

arXiv:2404.10740v318 citationsh-index: 12IJCAI
Originality Incremental advance
AI Analysis

This addresses the challenge of scalable and flexible cooperation for autonomous systems like self-driving cars, though it is incremental by extending existing ad hoc teamwork frameworks.

The paper tackles the problem of cooperative multi-agent learning in less restrictive real-world settings by introducing N-agent ad hoc teamwork (NAHT), where agents must cooperate with varying teammates, and proposes the POAM algorithm, which improves task returns and enables generalization to unseen teammates in empirical evaluations.

Current approaches to learning cooperative multi-agent behaviors assume relatively restrictive settings. In standard fully cooperative multi-agent reinforcement learning, the learning algorithm controls $\textit{all}$ agents in the scenario, while in ad hoc teamwork, the learning algorithm usually assumes control over only a $\textit{single}$ agent in the scenario. However, many cooperative settings in the real world are much less restrictive. For example, in an autonomous driving scenario, a company might train its cars with the same learning algorithm, yet once on the road, these cars must cooperate with cars from another company. Towards expanding the class of scenarios that cooperative learning methods may optimally address, we introduce $N$-agent ad hoc teamwork (NAHT), where a set of autonomous agents must interact and cooperate with dynamically varying numbers and types of teammates. This paper formalizes the problem, and proposes the Policy Optimization with Agent Modelling (POAM) algorithm. POAM is a policy gradient, multi-agent reinforcement learning approach to the NAHT problem, that enables adaptation to diverse teammate behaviors by learning representations of teammate behaviors. Empirical evaluation on tasks from the multi-agent particle environment and StarCraft II shows that POAM improves cooperative task returns compared to baseline approaches, and enables out-of-distribution generalization to unseen teammates.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes