Weblica: Scalable and Reproducible Training Environments for Visual Web Agents

Oğuzhan Fatih Kar, Roman Bachmann, Yuanzheng Gong, Anders Boesen Lindbo Larsen, Afshin Dehghan

arXiv:2605.0676194.2

Predicted impact top 13% in AI · last 90 daysOriginality Incremental advance

AI Analysis

This work addresses the scalability and reproducibility bottleneck in training visual web agents, which is critical for advancing autonomous web navigation.

Weblica introduces a framework for creating reproducible, scalable web environments using HTTP-level caching and LLM-based synthesis, enabling RL training across thousands of diverse tasks. Their Weblica-8B model outperforms similarly sized open-weight baselines on web navigation benchmarks and competes with API models.

The web is complex, open-ended, and constantly changing, making it challenging to scale training data for visual web agents. Existing data collection attempts remain limited to offline trajectories for supervised fine-tuning or a handful of simulated environments for RL training, thus failing to capture web diversity. We propose Weblica (Web Replica), a framework for constructing reproducible and scalable web environments. Our framework leverages 1) HTTP-level caching to capture and replay stable visual states while preserving interactive behavior and 2) LLM-based environment synthesis grounded in real-world websites and core web navigation skills. Using this framework, we scale RL training to thousands of diverse environments and tasks. Our best model, Weblica-8B, outperforms open-weight baselines of similar size across multiple web navigation benchmarks while using fewer inference steps, scales favorably with additional test-time compute, and is competitive with API models.

View on arXiv PDF

Similar