LGFeb 6

Adaptive Uncertainty-Aware Tree Search for Robust Reasoning

Zeen Song, Zihao Ma, Wenwen Qiang, Changwen Zheng, Gang Hua

arXiv:2602.06493v11.4h-index: 13

Originality Incremental advance

AI Analysis

This addresses a fundamental limitation in inference-time reasoning scaling for LLMs, though it appears incremental as it builds on existing PRM frameworks.

The paper tackles the problem of epistemic uncertainty in Process Reward Models (PRMs) when evaluating out-of-distribution reasoning paths, proposing Uncertainty-Aware Tree Search (UATS) which uses Monte Carlo Dropout and a reinforcement learning controller to mitigate errors, achieving effective reduction of OOD error impact in experiments.

Inference-time reasoning scaling has significantly advanced the capabilities of Large Language Models (LLMs) in complex problem-solving. A prevalent approach involves external search guided by Process Reward Models (PRMs). However, a fundamental limitation of this framework is the epistemic uncertainty of PRMs when evaluating reasoning paths that deviate from their training distribution. In this work, we conduct a systematic analysis of this challenge. We first provide empirical evidence that PRMs exhibit high uncertainty and unreliable scoring on out-of-distribution (OOD) samples. We then establish a theoretical framework proving that while standard search incurs linear regret accumulation, an uncertainty-aware strategy can achieve sublinear regret. Motivated by these findings, we propose Uncertainty-Aware Tree Search (UATS), a unified method that estimates uncertainty via Monte Carlo Dropout and dynamically allocates compute budget using a reinforcement learning-based controller. Extensive experiments demonstrate that our approach effectively mitigates the impact of OOD errors.

View on arXiv PDF

Similar