LGAICLJul 24, 2023

A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis

arXiv:2307.12856v4373 citationsh-index: 40
Originality Incremental advance
AI Analysis

This addresses the challenge of autonomous web automation for users needing to interact with complex websites, representing a strong specific gain rather than a broad paradigm shift.

The paper tackles the problem of poor performance of LLM-driven agents on real-world websites due to open domainness, limited context length, and lack of inductive bias on HTML, by introducing WebAgent, which improves success on real websites by over 50% and achieves an 18.7% higher success rate than prior methods on the MiniWoB benchmark.

Pre-trained large language models (LLMs) have recently achieved better generalization and sample efficiency in autonomous web automation. However, the performance on real-world websites has still suffered from (1) open domainness, (2) limited context length, and (3) lack of inductive bias on HTML. We introduce WebAgent, an LLM-driven agent that learns from self-experience to complete tasks on real websites following natural language instructions. WebAgent plans ahead by decomposing instructions into canonical sub-instructions, summarizes long HTML documents into task-relevant snippets, and acts on websites via Python programs generated from those. We design WebAgent with Flan-U-PaLM, for grounded code generation, and HTML-T5, new pre-trained LLMs for long HTML documents using local and global attention mechanisms and a mixture of long-span denoising objectives, for planning and summarization. We empirically demonstrate that our modular recipe improves the success on real websites by over 50%, and that HTML-T5 is the best model to solve various HTML understanding tasks; achieving 18.7% higher success rate than the prior method on MiniWoB web automation benchmark, and SoTA performance on Mind2Web, an offline task planning evaluation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes