AINov 19, 2025

Decomposing Theory of Mind: How Emotional Processing Mediates ToM Abilities in LLMs

arXiv:2511.15895v1
Originality Incremental advance
AI Analysis

This work addresses the problem of understanding cognitive mechanisms in LLMs for researchers in AI and cognitive science, but it is incremental as it builds on prior activation steering methods.

The study tackled the unclear internal mechanisms behind improved Theory of Mind (ToM) in LLMs through activation steering, finding that enhanced performance on belief attribution tasks (from 32.5% to 46.7% accuracy) is mediated by emotional processing rather than analytical reasoning.

Recent work shows activation steering substantially improves language models' Theory of Mind (ToM) (Bortoletto et al. 2024), yet the mechanisms of what changes occur internally that leads to different outputs remains unclear. We propose decomposing ToM in LLMs by comparing steered versus baseline LLMs' activations using linear probes trained on 45 cognitive actions. We applied Contrastive Activation Addition (CAA) steering to Gemma-3-4B and evaluated it on 1,000 BigToM forward belief scenarios (Gandhi et al. 2023), we find improved performance on belief attribution tasks (32.5\% to 46.7\% accuracy) is mediated by activations processing emotional content : emotion perception (+2.23), emotion valuing (+2.20), while suppressing analytical processes: questioning (-0.78), convergent thinking (-1.59). This suggests that successful ToM abilities in LLMs are mediated by emotional understanding, not analytical reasoning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes