CLAIJun 16, 2025

MotiveBench: How Far Are We From Human-Like Motivational Reasoning in Large Language Models?

arXiv:2506.13065v13 citationsh-index: 28ACL
Originality Incremental advance
AI Analysis

This work addresses the gap in assessing motivational reasoning in LLMs for applications like social simulations and AI companions, though it is incremental as it builds on existing benchmarking efforts.

The authors tackled the problem of evaluating how well large language models (LLMs) replicate human-like motivations by introducing MotiveBench, a benchmark with 200 scenarios and 600 tasks, and found that even advanced LLMs fall short in achieving human-like motivational reasoning, particularly in areas like 'love & belonging'.

Large language models (LLMs) have been widely adopted as the core of agent frameworks in various scenarios, such as social simulations and AI companions. However, the extent to which they can replicate human-like motivations remains an underexplored question. Existing benchmarks are constrained by simplistic scenarios and the absence of character identities, resulting in an information asymmetry with real-world situations. To address this gap, we propose MotiveBench, which consists of 200 rich contextual scenarios and 600 reasoning tasks covering multiple levels of motivation. Using MotiveBench, we conduct extensive experiments on seven popular model families, comparing different scales and versions within each family. The results show that even the most advanced LLMs still fall short in achieving human-like motivational reasoning. Our analysis reveals key findings, including the difficulty LLMs face in reasoning about "love & belonging" motivations and their tendency toward excessive rationality and idealism. These insights highlight a promising direction for future research on the humanization of LLMs. The dataset, benchmark, and code are available at https://aka.ms/motivebench.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes