CL AIOct 27, 2025

BrowseConf: Confidence-Guided Test-Time Scaling for Web Agents

Litu Ou, Kuan Li, Huifeng Yin, Liwen Zhang, Zhongwang Zhang, Xixi Wu, Rui Ye, Zile Qiao, Pengjun Xie, Jingren Zhou, Yong Jiang

arXiv:2510.23458v21 citationsh-index: 23Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of confidence estimation in complex multi-turn web agents, which is incremental as it builds on existing single-turn confidence methods.

The paper tackled the problem of LLM-based web agents communicating confidence in multi-turn interactions, finding that task accuracy is high at high confidence and near-zero at low confidence, and proposed Test-Time Scaling methods that significantly reduce token consumption while maintaining competitive performance.

Confidence in LLMs is a useful indicator of model uncertainty and answer reliability. Existing work mainly focused on single-turn scenarios, while research on confidence in complex multi-turn interactions is limited. In this paper, we investigate whether LLM-based search agents have the ability to communicate their own confidence through verbalized confidence scores after long sequences of actions, a significantly more challenging task compared to outputting confidence in a single interaction. Experimenting on open-source agentic models, we first find that models exhibit much higher task accuracy at high confidence while having near-zero accuracy when confidence is low. Based on this observation, we propose Test-Time Scaling (TTS) methods that use confidence scores to determine answer quality, encourage the model to try again until reaching a satisfactory confidence level. Results show that our proposed methods significantly reduce token consumption while demonstrating competitive performance compared to baseline fixed budget TTS methods.

View on arXiv PDF

Similar