CL CYMay 23, 2025

Towards Dynamic Theory of Mind: Evaluating LLM Adaptation to Temporal Evolution of Human States

Yang Xiao, Jiashuo Wang, Qiancheng Xu, Changhe Song, Chunpu Xu, Yi Cheng, Wenjie Li, Pengfei Liu

arXiv:2505.17663v215 citationsh-index: 8ACL

Originality Incremental advance

AI Analysis

This addresses a crucial gap in assessing LLMs for real-world human-AI interactions, though it is incremental as it builds on existing ToM benchmarks.

The paper tackled the problem of evaluating LLMs' ability to track dynamic mental states in social interactions, revealing that ten state-of-the-art LLMs underperform humans by 44.7% on a new benchmark.

As Large Language Models (LLMs) increasingly participate in human-AI interactions, evaluating their Theory of Mind (ToM) capabilities - particularly their ability to track dynamic mental states - becomes crucial. While existing benchmarks assess basic ToM abilities, they predominantly focus on static snapshots of mental states, overlooking the temporal evolution that characterizes real-world social interactions. We present \textsc{DynToM}, a novel benchmark specifically designed to evaluate LLMs' ability to understand and track the temporal progression of mental states across interconnected scenarios. Through a systematic four-step framework, we generate 1,100 social contexts encompassing 5,500 scenarios and 78,100 questions, each validated for realism and quality. Our comprehensive evaluation of ten state-of-the-art LLMs reveals that their average performance underperforms humans by 44.7\%, with performance degrading significantly when tracking and reasoning about the shift of mental states. This performance gap highlights fundamental limitations in current LLMs' ability to model the dynamic nature of human mental states.

View on arXiv PDF

Similar