CLLGMLFeb 24, 2025

CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought

arXiv:2502.17214v215 citationsh-index: 5Has CodeACL
Originality Incremental advance
AI Analysis

This addresses the challenge of detecting misinformation and ensuring reliable decision-making in LLMs, though it appears incremental as it builds on existing Chain-of-Thought approaches.

The paper tackles the problem of inaccurate uncertainty quantification in large language models by proposing CoT-UQ, a response-wise framework that leverages Chain-of-Thought reasoning, achieving an average 5.9% AUROC improvement over existing methods on logical and mathematical reasoning tasks.

Large language models (LLMs) excel in many tasks but struggle to accurately quantify uncertainty in their generated responses. This limitation makes it challenging to detect misinformation and ensure reliable decision-making. Existing uncertainty quantification (UQ) methods for LLMs are primarily prompt-wise rather than response-wise, often requiring multiple response samples, which incurs high computational costs. Moreover, LLMs have been shown to be overconfident, particularly when using reasoning steps to derive their answers. In this work, we propose CoT-UQ, a response-wise UQ framework that integrates LLMs' inherent reasoning capabilities through Chain-of-Thought (CoT) into the UQ process. CoT-UQ captures critical information during inference by extracting keywords from each reasoning step and assessing their importance to the final answer. This key reasoning information is then aggregated to produce a final uncertainty estimate. We conduct extensive experiments based on Llama Family with model sizes varying from 8B to 13B across logical and mathematical reasoning tasks. Experimental results demonstrate that CoT-UQ significantly outperforms existing UQ methods, achieving an average improvement of 5.9% AUROC compared to current UQ methods. The code is available at: https://github.com/ZBox1005/CoT-UQ.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes