AISep 24, 2025

Calibrated Reasoning: An Explanatory Verifier for Dynamic and Efficient Problem-Solving

Anisha Garg, Engin Tekin, Yash More, David Bick, Nishit Neema, Ganesh Venkatesh

arXiv:2509.19681v11 citationsh-index: 4

Originality Incremental advance

AI Analysis

This addresses a bottleneck in scaling reasoning models for AI applications, offering an incremental improvement over existing methods.

The paper tackles the problem of poor self-evaluation in reasoning models by proposing an Explanatory Verifier that produces calibrated confidence scores and natural language reasoning, improving the accuracy and efficiency of test-time strategies like best-of-n and self-reflection.

Advanced test-time computing strategies are essential for scaling reasoning models, but their effectiveness is capped by the models' poor self-evaluation. We propose a pairwise Explanatory Verifier, trained via reinforcement learning (GRPO), that produces calibrated confidence scores and associated natural language reasoning for generated solutions. Our verifier improves the accuracy and efficiency of test-time strategies like best-of-n and self-reflection. Crucially, it excels at identifying challenging failure modes, such as when both candidate solutions are identically incorrect, succeeding where standard methods like majority voting fail.

View on arXiv PDF

Similar