CLMar 17

Learning When to Sample: Confidence-Aware Self-Consistency for Efficient LLM Chain-of-Thought Reasoning

Juming Xiong, Kevin Guo, Congning Ni, Chao Yan, Katherine Brown, Avinash Baidya, Xiang Gao, Bradley Malin, Zhijun Yin

arXiv:2603.0899919.52 citationsh-index: 7

Predicted impact top 80% in CL · last 90 daysOriginality Incremental advance

AI Analysis

This work addresses efficiency issues in LLM reasoning for applications like medical and math QA, though it is incremental as it builds on self-consistency methods.

The paper tackles the problem of high inference costs in large language models due to unnecessarily long reasoning paths in chain-of-thought reasoning, by introducing a confidence-aware decision framework that maintains comparable accuracy while using up to 80% fewer tokens.

Large language models (LLMs) achieve strong reasoning performance through chain-of-thought (CoT) reasoning, yet often generate unnecessarily long reasoning paths that incur high inference cost. Recent self-consistency-based approaches further improve accuracy but require sampling and aggregating multiple reasoning trajectories, leading to substantial additional computational overhead. This paper introduces a confidence-aware decision framework that analyzes a single completed reasoning trajectory to adaptively select between single-path and multi-path reasoning. The framework is trained using sentence-level numeric and linguistic features extracted from intermediate reasoning states in the MedQA dataset and generalizes effectively to MathQA, MedMCQA, and MMLU without additional fine-tuning. Experimental results show that the proposed method maintains accuracy comparable to multi-path baselines while using up to 80\% fewer tokens. These findings demonstrate that reasoning trajectories contain rich signals for uncertainty estimation, enabling a simple, transferable mechanism to balance accuracy and efficiency in LLM reasoning.

View on arXiv PDF

Similar