CLAICYJul 27, 2025

Cognitive Chain-of-Thought: Structured Multimodal Reasoning about Social Situations

AI2CMU
arXiv:2507.20409v13 citationsh-index: 49
Originality Incremental advance
AI Analysis

This work addresses the challenge of making multimodal AI systems safer and more reliable in social situations, representing an incremental improvement over chain-of-thought prompting.

The paper tackled the problem of improving visual language models' reasoning in social contexts by introducing Cognitive Chain-of-Thought (CoCoT), a prompting strategy with three stages, resulting in an average performance gain of +8% over existing methods on multimodal benchmarks.

Chain-of-Thought (CoT) prompting helps models think step by step. But what happens when they must see, understand, and judge-all at once? In visual tasks grounded in social context, where bridging perception with norm-grounded judgments is essential, flat CoT often breaks down. We introduce Cognitive Chain-of-Thought (CoCoT), a prompting strategy that scaffolds VLM reasoning through three cognitively inspired stages: perception, situation, and norm. Our experiments show that, across multiple multimodal benchmarks (including intent disambiguation, commonsense reasoning, and safety), CoCoT consistently outperforms CoT and direct prompting (+8\% on average). Our findings demonstrate that cognitively grounded reasoning stages enhance interpretability and social awareness in VLMs, paving the way for safer and more reliable multimodal systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes