CL AI CYJul 27, 2025

Cognitive Chain-of-Thought: Structured Multimodal Reasoning about Social Situations

Eunkyu Park, Wesley Hanwen Deng, Gunhee Kim, Motahhare Eslami, Maarten Sap

AI2CMU

arXiv:2507.20409v13 citationsh-index: 49

Originality Incremental advance

AI Analysis

This work addresses the challenge of making multimodal AI systems safer and more reliable in social situations, representing an incremental improvement over chain-of-thought prompting.

The paper tackled the problem of improving visual language models' reasoning in social contexts by introducing Cognitive Chain-of-Thought (CoCoT), a prompting strategy with three stages, resulting in an average performance gain of +8% over existing methods on multimodal benchmarks.

Chain-of-Thought (CoT) prompting helps models think step by step. But what happens when they must see, understand, and judge-all at once? In visual tasks grounded in social context, where bridging perception with norm-grounded judgments is essential, flat CoT often breaks down. We introduce Cognitive Chain-of-Thought (CoCoT), a prompting strategy that scaffolds VLM reasoning through three cognitively inspired stages: perception, situation, and norm. Our experiments show that, across multiple multimodal benchmarks (including intent disambiguation, commonsense reasoning, and safety), CoCoT consistently outperforms CoT and direct prompting (+8\% on average). Our findings demonstrate that cognitively grounded reasoning stages enhance interpretability and social awareness in VLMs, paving the way for safer and more reliable multimodal systems.

View on arXiv PDF

Similar