CLMay 8

SCENE: Recognizing Social Norms and Sanctioning in Group Chats

arXiv:2605.0782319.0

Predicted impact top 32% in CL · last 90 daysOriginality Incremental advance

AI Analysis

For AI safety and social AI research, this provides a dynamic evaluation of LLMs' social norm adaptation, addressing a gap in interactional testing.

The paper introduces SCENE, a benchmark for evaluating LLMs' ability to recognize and adapt to implicit social norms in group chats. Results show Claude Opus 4.7 and Gemini 3.1 Pro adapt significantly better than open-weight models.

Online group chats are social spaces with implicit behavior patterns that, when broken, are often met with social sanctioning from the group. The ability and willingness of LLM-based agents to recognize and adapt to these norms remains mostly unexplored. We introduce SCENE, a social-interaction benchmark focused on implicit norms and social sanctioning in multi-party chat. SCENE generates plausible non-roleplay scenarios with scripted personas that follow a hidden norm, create opportunities for the subject agent to violate it, and sanction breaches when they occur. We further propose behavioral evaluation metrics for two functional adaptation abilities: responsiveness to negative sanctioning, and adapting norm from peers behavior. We evaluate six frontier and open-weight models on SCENE. Our results show that Claude Opus 4.7 and Gemini 3.1 Pro adapt to implicit norms significantly more than the evaluated open-weight models. SCENE contributes one benchmark in the direction of recent calls for dynamic, interactional evaluation of LLM social capabilities.

View on arXiv PDF

Similar