LGAIJun 9, 2025

Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction

CMU
arXiv:2506.07976v233 citationsh-index: 51Has Code
AI Analysis

This work addresses the problem of enabling adaptive behavior in AI agents for interactive domains like web navigation, offering a novel scaling dimension that complements existing methods.

The paper tackles the limitation of test-time scaling in agent problems by proposing to scale test-time interaction, which increases the agent's interaction horizon to enable behaviors like exploration and dynamic re-planning. The result is that their TTI approach, using a Gemma 3 12B model, achieves state-of-the-art performance on WebVoyager and WebArena benchmarks, improving task success non-trivially even with prompting-based methods.

The current paradigm of test-time scaling relies on generating long reasoning traces ("thinking" more) before producing a response. In agent problems that require interaction, this can be done by generating thinking traces before acting in the world. However, this process does not allow agents to acquire new information from the environment or adapt their behavior over time. In this work, we propose to scale test-time interaction, an untapped dimension of test-time scaling that increases the agent's interaction horizon to enable running rich behaviors such as exploration, backtracking, and dynamic re-planning within a single rollout. To demonstrate the promise of this scaling dimension, we study the domain of web agents. We first show that even prompting-based interaction scaling without any training can improve task success on web benchmarks non-trivially. Building on this, we introduce TTI (Test-Time Interaction), a curriculum-based online reinforcement learning (RL) approach that trains agents by adaptively adjusting their rollout lengths. Using a Gemma 3 12B model, TTI produces state-of-the-art open-source, open-data web agents on WebVoyager and WebArena benchmarks. We further show that TTI enables agents to balance exploration and exploitation adaptively. Our results establish interaction scaling as a powerful, complementary axis to scaling per-step compute, offering new avenues for training adaptive agents.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes