Traces of Social Competence in Large Language Models
This research addresses the reliability and explanatory potential of the False Belief Test for assessing socio-cognitive competencies in LLMs, which is important for understanding the limitations and biases of current models.
This paper investigates the social competence of 17 open-weight Large Language Models (LLMs) using 192 variants of the False Belief Test (FBT). They found that while scaling model size generally improves performance, explicating propositional attitudes fundamentally alters response patterns, a phenomenon that emerges during pre-training and can be amplified by reasoning-oriented finetuning.
The False Belief Test (FBT) has been the main method for assessing Theory of Mind (ToM) and related socio-cognitive competencies. For Large Language Models (LLMs), the reliability and explanatory potential of this test have remained limited due to issues like data contamination, insufficient model details, and inconsistent controls. We address these issues by testing 17 open-weight models on a balanced set of 192 FBT variants (Trott et al. 2023) using Bayesian Logistic regression to identify how model size and post-training affect socio-cognitive competence. We find that scaling model size benefits performance, but not strictly. A cross-over effect reveals that explicating propositional attitudes (X thinks) fundamentally alters response patterns. Instruction tuning partially mitigates this effect, but further reasoning-oriented finetuning amplifies it. In a case study analysing social reasoning ability throughout OLMo 2 training, we show that this cross-over effect emerges during pre-training, suggesting that models acquire stereotypical response patterns tied to mental-state vocabulary that can outweigh other scenario semantics. Finally, vector steering allows us to isolate a think vector as the causal driver of observed FBT behaviour.