AIFeb 13

Scaling Web Agent Training through Automatic Data Generation and Fine-grained Evaluation

arXiv:2602.12544v12 citationsh-index: 18Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of efficiently creating diverse, realistic web interaction datasets for web agents, with incremental improvements in data generation and evaluation.

The authors tackled the problem of generating high-quality training data for web agents by introducing a constraint-based evaluation framework for fine-grained assessment of task progress, enabling the use of partially successful trajectories. Their distilled student model outperformed open-source approaches and matched or exceeded commercial systems on a new benchmark called BookingArena, while being significantly smaller.

We present a scalable pipeline for automatically generating high-quality training data for web agents. In particular, a major challenge in identifying high-quality training instances is trajectory evaluation - quantifying how much progress was made towards task completion. We introduce a novel constraint-based evaluation framework that provides fine-grained assessment of progress towards task completion. This enables us to leverage partially successful trajectories, which significantly expands the amount of usable training data. We evaluate our method on a new benchmark we propose called BookingArena, which consists of complex booking tasks across 20 popular websites, and demonstrate that our distilled student model outperforms open-source approaches and matches or exceeds commercial systems, while being a significantly smaller model. Our work addresses the challenge of efficiently creating diverse, realistic web interaction datasets and provides a systematic evaluation methodology for complex structured web tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes