CLAIOct 25, 2023

HI-TOM: A Benchmark for Evaluating Higher-Order Theory of Mind Reasoning in Large Language Models

arXiv:2310.16755v1162 citationsh-index: 50
Originality Incremental advance
AI Analysis

This work addresses a critical gap in assessing advanced cognitive abilities in AI, which is important for researchers in natural language processing and cognitive science, though it is incremental as it builds on existing ToM benchmarks.

The authors tackled the problem of evaluating higher-order Theory of Mind reasoning in large language models by introducing the HI-TOM benchmark, and found that performance declines on higher-order tasks, demonstrating limitations in current models.

Theory of Mind (ToM) is the ability to reason about one's own and others' mental states. ToM plays a critical role in the development of intelligence, language understanding, and cognitive processes. While previous work has primarily focused on first and second-order ToM, we explore higher-order ToM, which involves recursive reasoning on others' beliefs. We introduce HI-TOM, a Higher Order Theory of Mind benchmark. Our experimental evaluation using various Large Language Models (LLMs) indicates a decline in performance on higher-order ToM tasks, demonstrating the limitations of current LLMs. We conduct a thorough analysis of different failure cases of LLMs, and share our thoughts on the implications of our findings on the future of NLP.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes