AICLJun 2, 2025

Overcoming Multi-step Complexity in Multimodal Theory-of-Mind Reasoning: A Scalable Bayesian Planner

MIT
arXiv:2506.01301v15 citationsh-index: 30ICML
Originality Incremental advance
AI Analysis

This addresses the challenge of modeling human mental states in complex, multimodal environments for AI systems, representing an incremental advance with specific gains.

The paper tackles the problem of scalability and generalization in multimodal Theory-of-Mind reasoning by proposing a scalable Bayesian planner that decomposes reasoning into stepwise Bayesian updates, achieving a 4.6% accuracy improvement over state-of-the-art methods on benchmarks.

Theory-of-Mind (ToM) enables humans to infer mental states-such as beliefs, desires, and intentions-forming the foundation of social cognition. However, existing computational ToM methods rely on structured workflows with ToM-specific priors or deep model fine-tuning, which struggle with scalability in multimodal environments and fail to generalize as task complexity increases. To address these limitations, we propose a scalable Bayesian ToM planner that decomposes ToM reasoning into stepwise Bayesian updates. Our framework introduces weak-to-strong control, allowing smaller language models (LMs) to specialize in ToM-specific likelihood estimation and transfer their reasoning behaviors to larger LMs (7B to 405B) for integration with social and world knowledge. This synergistic approach aligns large-model inference of human mental states with Bayesian principles. Extensive experiments show that our method achieves a 4.6% accuracy improvement over state-of-the-art techniques on multimodal ToM benchmarks, including challenging unseen scenarios, thereby establishing a new standard for modeling human mental states in complex environments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes