Robust Search with Uncertainty-Aware Value Models for Language Model Reasoning
This work addresses robustness issues in LLM reasoning for AI researchers, offering a systematic integration of uncertainty quantification to mitigate pruning errors in search.
The paper tackles the problem of verifier failure in value model guided search for language model reasoning by proposing an uncertainty-aware framework with value distributions and group Thompson sampling, which significantly boosts solution coverage on out-of-distribution problems like AIME25 and Minerva Math.
Value model guided search is effective in steering LLM generation but suffers from a lack of robustness. This is due to verifier failure: imperfect VMs mistakenly prune valid reasoning paths, especially when encountering unseen reasoning paths generated during search. To address this, we propose an uncertainty-aware framework with two key components: (1) Uncertainty-Aware Value Models (UVMs), which replace single-point value estimates with value distributions to quantify prediction reliability, and (2) Group Thompson Sampling, an efficient algorithm that selects candidates based on their probability of being optimal. Experiments on two In-Distribution (ID) settings (GSM8K, MATH) and three Out-Of-Distribution (OOD) settings (e.g., AIME25, Minerva Math) show our method significantly mitigates verifier failure and boosts solution coverage, especially on OOD problems. This work provides the first systematic integration of uncertainty quantification into LLM search paradigms, enhancing robustness. The code is released at https://github.com/FreedomIntelligence/UVM.