Reasoning Promotes Robustness in Theory of Mind Tasks
This addresses the evaluation of social-cognitive behavior in LLMs, but is incremental as it clarifies existing capabilities rather than introducing new methods.
The paper investigates reasoning-oriented LLMs in Theory of Mind tasks, finding they show increased robustness to prompt variations and task perturbations, with gains attributed to better solution-finding rather than new reasoning forms.
Large language models (LLMs) have recently shown strong performance on Theory of Mind (ToM) tests, prompting debate about the nature and true performance of the underlying capabilities. At the same time, reasoning-oriented LLMs trained via reinforcement learning with verifiable rewards (RLVR) have achieved notable improvements across a range of benchmarks. This paper examines the behavior of such reasoning models in ToM tasks, using novel adaptations of machine psychological experiments and results from established benchmarks. We observe that reasoning models consistently exhibit increased robustness to prompt variations and task perturbations. Our analysis indicates that the observed gains are more plausibly attributed to increased robustness in finding the correct solution, rather than to fundamentally new forms of ToM reasoning. We discuss the implications of this interpretation for evaluating social-cognitive behavior in LLMs.