CLFeb 3

One Model, All Roles: Multi-Turn, Multi-Agent Self-Play Reinforcement Learning for Conversational Social Intelligence

arXiv:2602.03109v12 citationsh-index: 9
Originality Incremental advance
AI Analysis

This work addresses the challenge of AI social intelligence in group conversations, offering a novel approach that could advance research in this domain, though it is incremental in building on existing reinforcement learning methods.

The paper tackles the problem of developing social intelligence in AI by introducing OMAR, a reinforcement learning framework that uses multi-turn, multi-agent self-play to enable a single model to role-play all participants in conversations, resulting in emergent skills like empathy and persuasion as shown in evaluations on SOTOPIA and Werewolf games.

This paper introduces OMAR: One Model, All Roles, a reinforcement learning framework that enables AI to develop social intelligence through multi-turn, multi-agent conversational self-play. Unlike traditional paradigms that rely on static, single-turn optimizations, OMAR allows a single model to role-play all participants in a conversation simultaneously, learning to achieve long-term goals and complex social norms directly from dynamic social interaction. To ensure training stability across long dialogues, we implement a hierarchical advantage estimation that calculates turn-level and token-level advantages. Evaluations in the SOTOPIA social environment and Werewolf strategy games show that our trained models develop fine-grained, emergent social intelligence, such as empathy, persuasion, and compromise seeking, demonstrating the effectiveness of learning collaboration even under competitive scenarios. While we identify practical challenges like reward hacking, our results show that rich social intelligence can emerge without human supervision. We hope this work incentivizes further research on AI social intelligence in group conversations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes