CLMar 4, 2024

Views Are My Own, but Also Yours: Benchmarking Theory of Mind Using Common Ground

Adil Soubki, John Murzaku, Arash Yousefi Jordehi, Peter Zeng, Magdalena Markowska, Seyed Abolghasem Mirroshandel, Owen Rambow

arXiv:2403.02451v216.631 citationsh-index: 6ACL

Originality Incremental advance

AI Analysis

This addresses the problem of misalignment with human behavior in ToM benchmarks for AI researchers, though it is incremental as it builds on existing evaluation efforts.

The paper tackled evaluating theory of mind in language models by introducing Common-ToM, the first dataset based on naturally occurring spoken dialogs, and found that LMs struggle with it, but integrating explicit belief representations improved performance.

Evaluating the theory of mind (ToM) capabilities of language models (LMs) has recently received a great deal of attention. However, many existing benchmarks rely on synthetic data, which risks misaligning the resulting experiments with human behavior. We introduce the first ToM dataset based on naturally occurring spoken dialogs, Common-ToM, and show that LMs struggle to demonstrate ToM. We then show that integrating a simple, explicit representation of beliefs improves LM performance on Common-ToM.

View on arXiv PDF

Similar