Go-Browse: Training Web Agents with Structured Exploration
This addresses the challenge of training efficient web agents for tasks like navigation, though it is incremental as it builds on existing benchmarks and models.
The paper tackles the problem of web browsing agents getting lost in unfamiliar websites by proposing Go-Browse, a method for collecting diverse web agent data through structured exploration, which achieves a 21.7% success rate on the WebArena benchmark, beating GPT-4o mini by 2.4%.
One of the fundamental problems in digital agents is their lack of understanding of their environment. For instance, a web browsing agent may get lost in unfamiliar websites, uncertain what pages must be visited to achieve its goals. To address this, we propose Go-Browse, a method for automatically collecting diverse and realistic web agent data at scale through structured exploration of web environments. Go-Browse achieves efficient exploration by framing data collection as a graph search, enabling reuse of information across exploration episodes. We instantiate our method on the WebArena benchmark, collecting a dataset of 10K successful task-solving trajectories and 40K interaction steps across 100 URLs. Fine-tuning a 7B parameter language model on this dataset achieves a success rate of 21.7% on the WebArena benchmark, beating GPT-4o mini by 2.4% and exceeding current state-of-the-art results for sub-10B parameter models by 2.9%.