AIOct 22, 2025

Surfer 2: The Next Generation of Cross-Platform Computer Use Agents

Cambridge
arXiv:2510.19949v24 citationsh-index: 6
Originality Highly original
AI Analysis

This addresses the problem of cross-platform computer use for AI agents, representing a significant advance rather than an incremental improvement.

The paper tackled the challenge of building agents that generalize across web, desktop, and mobile environments by introducing Surfer 2, a unified visual-based architecture that achieved state-of-the-art performance with accuracies like 97.1% on WebVoyager and exceeded human performance on all benchmarks.

Building agents that generalize across web, desktop, and mobile environments remains an open challenge, as prior systems rely on environment-specific interfaces that limit cross-platform deployment. We introduce Surfer 2, a unified architecture operating purely from visual observations that achieves state-of-the-art performance across all three environments. Surfer 2 integrates hierarchical context management, decoupled planning and execution, and self-verification with adaptive recovery, enabling reliable operation over long task horizons. Our system achieves 97.1% accuracy on WebVoyager, 69.6% on WebArena, 60.1% on OSWorld, and 87.1% on AndroidWorld, outperforming all prior systems without task-specific fine-tuning. With multiple attempts, Surfer 2 exceeds human performance on all benchmarks. These results demonstrate that systematic orchestration amplifies foundation model capabilities and enables general-purpose computer control through visual interaction alone, while calling for a next-generation vision language model to achieve Pareto-optimal cost-efficiency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes