LGMar 17

The Cost of Reasoning: Chain-of-Thought Induces Overconfidence in Vision-Language Models

arXiv:2603.1672895.01 citationsh-index: 2

AI Analysis

This addresses a critical problem for deploying vision-language models in high-stakes settings where reliable uncertainty estimation is essential, revealing a significant drawback in widely used reasoning methods.

The study found that chain-of-thought reasoning in vision-language models consistently degrades uncertainty quantification reliability by causing overconfidence, even when it improves task accuracy, with implicit answer conditioning identified as the key mechanism.

Vision-language models (VLMs) are increasingly deployed in high-stakes settings where reliable uncertainty quantification (UQ) is as important as predictive accuracy. Extended reasoning via chain-of-thought (CoT) prompting or reasoning-trained models has become ubiquitous in modern VLM pipelines, yet its effect on UQ reliability remains poorly understood. We show that reasoning consistently degrades the quality of most uncertainty estimates, even when it improves task accuracy. We identify implicit answer conditioning as the primary mechanism: as reasoning traces converge on a conclusion before the final answer is generated, token probabilities increasingly reflect consistency with the model's own reasoning trace rather than uncertainty about correctness. In effect, the model becomes overconfident in its answer. In contrast, agreement-based consistency remains robust and often improves under reasoning, making it a practical choice for uncertainty estimation in reasoning-enabled VLMs.

View on arXiv PDF

Similar