AIJun 3, 2025

Surfer-H Meets Holo1: Cost-Efficient Web Agent Powered by Open Weights

HarvardStanford
arXiv:2506.02865v28 citationsh-index: 31
Originality Incremental advance
AI Analysis

This work addresses the need for affordable and effective web automation tools for researchers and developers, though it appears incremental by building on existing VLM and agent frameworks.

The paper tackles the problem of developing cost-efficient web agents by introducing Surfer-H, which integrates Vision-Language Models (VLMs) for user-defined web tasks, and Holo1, a new open-weight VLM collection specialized in web navigation, achieving a 92.2% state-of-the-art performance on WebVoyager.

We present Surfer-H, a cost-efficient web agent that integrates Vision-Language Models (VLM) to perform user-defined tasks on the web. We pair it with Holo1, a new open-weight collection of VLMs specialized in web navigation and information extraction. Holo1 was trained on carefully curated data sources, including open-access web content, synthetic examples, and self-produced agentic data. Holo1 tops generalist User Interface (UI) benchmarks as well as our new web UI localization benchmark, WebClick. When powered by Holo1, Surfer-H achieves a 92.2% state-of-the-art performance on WebVoyager, striking a Pareto-optimal balance between accuracy and cost-efficiency. To accelerate research advancement in agentic systems, we are open-sourcing both our WebClick evaluation dataset and the Holo1 model weights.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes