CLAINov 22, 2024

ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data

arXiv:2411.15004v232 citationsh-index: 51Has Code
Originality Incremental advance
AI Analysis

This addresses the need for more effective web agents in specialized domains, though it is incremental as it builds on existing fine-tuning methods with new data.

The paper tackled the problem of LLM agents struggling with specialized web contexts and long-horizon planning by fine-tuning open-source LLMs on production-scale workflow data, achieving state-of-the-art performance on Mind2Web and a 7.3% improvement in task success rate on WebArena.

Large Language Model (LLM) agents are rapidly improving to handle increasingly complex web-based tasks. Most of these agents rely on general-purpose, proprietary models like GPT-4 and focus on designing better prompts to improve their planning abilities. However, general-purpose LLMs are not specifically trained to understand specialized web contexts such as HTML, and they often struggle with long-horizon planning. We explore an alternative approach that fine-tunes open-source LLMs using production-scale workflow data collected from over 250 domains corresponding to 6 billion tokens. This simple yet effective approach shows substantial gains over prompting-based agents on existing benchmarks -- ScribeAgent achieves state-of-the-art direct generation performance on Mind2Web and improves the task success rate by 7.3% over the previous best text-only web agents on WebArena. We further perform detailed ablation studies on various fine-tuning design choices and provide insights into LLM selection, training recipes, context window optimization, and effect of dataset sizes.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes