LGCLMar 18, 2025

Don't lie to your friends: Learning what you know from collaborative self-play

arXiv:2503.14481v312 citationsh-index: 86
Originality Highly original
AI Analysis

This addresses the challenge of enabling AI assistants to better manage their knowledge and tool interactions, though it is incremental in applying multi-agent collaboration to this specific bottleneck.

The paper tackles the problem of teaching AI agents to be aware of their capabilities and limitations, such as when to use tools or abstain, by proposing collaborative self-play where multi-agent groups are rewarded for correct answers, resulting in policies that transfer to improve tool use and selective prediction in individual agent deployments.

To be helpful assistants, AI agents must be aware of their own capabilities and limitations. This includes knowing when to answer from parametric knowledge versus using tools, when to trust tool outputs, and when to abstain or hedge. Such capabilities are hard to teach through supervised fine-tuning because they require constructing examples that reflect the agent's specific capabilities. We therefore propose a radically new approach to teaching agents what they know: \emph{collaborative self-play}. We construct multi-agent collaborations in which the group is rewarded for collectively arriving at correct answers. The desired meta-knowledge emerges from the incentives built into the structure of the interaction. We focus on small societies of agents that have access to heterogeneous tools (corpus-specific retrieval), and therefore must collaborate to maximize their success while minimizing their effort. Experiments show that group-level rewards for multi-agent communities can induce policies that \emph{transfer} to improve tool use and selective prediction in settings where individual agents are deployed in isolation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes