AIApr 25, 2025

MultiMind: Enhancing Werewolf Agents with Multimodal Reasoning and Theory of Mind

arXiv:2504.18039v46 citationsh-index: 2MM
Originality Incremental advance
AI Analysis

This addresses the need for more human-like social reasoning in LLM agents for social deduction games, representing an incremental improvement over existing text-only approaches.

The paper tackled the problem of limited multimodal reasoning and Theory of Mind in social deduction game agents by introducing MultiMind, which integrates facial expressions, vocal tones, and a ToM model with MCTS, resulting in superior performance in gameplay evaluations.

Large Language Model (LLM) agents have demonstrated impressive capabilities in social deduction games (SDGs) like Werewolf, where strategic reasoning and social deception are essential. However, current approaches remain limited to textual information, ignoring crucial multimodal cues such as facial expressions and tone of voice that humans naturally use to communicate. Moreover, existing SDG agents primarily focus on inferring other players' identities without modeling how others perceive themselves or fellow players. To address these limitations, we use One Night Ultimate Werewolf (ONUW) as a testbed and present MultiMind, the first framework integrating multimodal information into SDG agents. MultiMind processes facial expressions and vocal tones alongside verbal content, while employing a Theory of Mind (ToM) model to represent each player's suspicion levels toward others. By combining this ToM model with Monte Carlo Tree Search (MCTS), our agent identifies communication strategies that minimize suspicion directed at itself. Through comprehensive evaluation in both agent-versus-agent simulations and studies with human players, we demonstrate MultiMind's superior performance in gameplay. Our work presents a significant advancement toward LLM agents capable of human-like social reasoning across multimodal domains.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes