CLAIOct 27, 2025

BrowseConf: Confidence-Guided Test-Time Scaling for Web Agents

arXiv:2510.23458v21 citationsh-index: 23Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of confidence estimation in complex multi-turn web agents, which is incremental as it builds on existing single-turn confidence methods.

The paper tackled the problem of LLM-based web agents communicating confidence in multi-turn interactions, finding that task accuracy is high at high confidence and near-zero at low confidence, and proposed Test-Time Scaling methods that significantly reduce token consumption while maintaining competitive performance.

Confidence in LLMs is a useful indicator of model uncertainty and answer reliability. Existing work mainly focused on single-turn scenarios, while research on confidence in complex multi-turn interactions is limited. In this paper, we investigate whether LLM-based search agents have the ability to communicate their own confidence through verbalized confidence scores after long sequences of actions, a significantly more challenging task compared to outputting confidence in a single interaction. Experimenting on open-source agentic models, we first find that models exhibit much higher task accuracy at high confidence while having near-zero accuracy when confidence is low. Based on this observation, we propose Test-Time Scaling (TTS) methods that use confidence scores to determine answer quality, encourage the model to try again until reaching a satisfactory confidence level. Results show that our proposed methods significantly reduce token consumption while demonstrating competitive performance compared to baseline fixed budget TTS methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes