MAAICLFLLGNov 4, 2025

Automata-Conditioned Cooperative Multi-Agent Reinforcement Learning

arXiv:2511.02304v12 citationsh-index: 72
Originality Highly original
AI Analysis

This addresses sample inefficiency and single-task limitations in cooperative multi-agent reinforcement learning for domains like robotics or automation.

The paper tackles the problem of learning multi-task, multi-agent policies for cooperative, temporal objectives by proposing ACC-MARL, a framework that enables emergent task-aware coordination among agents, such as pressing buttons and holding doors, with proven correctness and optimal task assignment at test time.

We study the problem of learning multi-task, multi-agent policies for cooperative, temporal objectives, under centralized training, decentralized execution. In this setting, using automata to represent tasks enables the decomposition of complex tasks into simpler sub-tasks that can be assigned to agents. However, existing approaches remain sample-inefficient and are limited to the single-task case. In this work, we present Automata-Conditioned Cooperative Multi-Agent Reinforcement Learning (ACC-MARL), a framework for learning task-conditioned, decentralized team policies. We identify the main challenges to ACC-MARL's feasibility in practice, propose solutions, and prove the correctness of our approach. We further show that the value functions of learned policies can be used to assign tasks optimally at test time. Experiments show emergent task-aware, multi-step coordination among agents, e.g., pressing a button to unlock a door, holding the door, and short-circuiting tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes