AICLJan 23

Reasoning Promotes Robustness in Theory of Mind Tasks

arXiv:2601.16853v1h-index: 17
Originality Synthesis-oriented
AI Analysis

This addresses the evaluation of social-cognitive behavior in LLMs, but is incremental as it clarifies existing capabilities rather than introducing new methods.

The paper investigates reasoning-oriented LLMs in Theory of Mind tasks, finding they show increased robustness to prompt variations and task perturbations, with gains attributed to better solution-finding rather than new reasoning forms.

Large language models (LLMs) have recently shown strong performance on Theory of Mind (ToM) tests, prompting debate about the nature and true performance of the underlying capabilities. At the same time, reasoning-oriented LLMs trained via reinforcement learning with verifiable rewards (RLVR) have achieved notable improvements across a range of benchmarks. This paper examines the behavior of such reasoning models in ToM tasks, using novel adaptations of machine psychological experiments and results from established benchmarks. We observe that reasoning models consistently exhibit increased robustness to prompt variations and task perturbations. Our analysis indicates that the observed gains are more plausibly attributed to increased robustness in finding the correct solution, rather than to fundamentally new forms of ToM reasoning. We discuss the implications of this interpretation for evaluating social-cognitive behavior in LLMs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes