CVJun 2

GroupToM-Bench: Benchmarking Group Theory of Mind and Nonlinear Social Emergence in MLLMs

arXiv:2606.0418486.8
Predicted impact top 20% in CV · last 90 daysOriginality Highly original
AI Analysis

This work identifies and measures a critical limitation in MLLMs—their inability to model group-level social emergence—for AI researchers aiming to build socially intelligent agents.

The paper introduces GroupToM-Bench, the first multimodal benchmark for group-level Theory of Mind, and finds that current multimodal large language models significantly underperform humans, revealing a failure to process social structures and non-linear collective dynamics.

True general intelligence requires not only a model of the physical world but also a social world model: the capacity to infer how individual mental states interact and crystallize into group-level outcomes. Despite notable progress in individual-level Theory of Mind (ToM) reasoning, existing multimodal large language models fail at this broader task. Collective behavior emerges non-linearly from social tensions, conformity dynamics, and structural constraints, meaning it cannot be recovered by merely summing individual intentions. We present GroupToM-Bench, the first multimodal benchmark for group-level ToM, built around a causal chain spanning micro-level BDI states (belief, desire, intention), meso-level group tension and structural constraints, and macro-level outcome prediction and mechanistic attribution. To probe this full arc, we develop a seven-level cognitive audit framework. Experiments reveal a gap between current models and human baselines, highlighting a failure to process social structures and non-linear collective dynamics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes