CLApr 4, 2024

AutoWebGLM: A Large Language Model-based Web Navigating Agent

Tsinghua
arXiv:2404.03648v2145 citationsh-index: 36Has CodeKDD
Originality Incremental advance
AI Analysis

It addresses inefficiencies in automated web navigation for users needing reliable AI agents, though it appears incremental as it builds on existing LLM methods with specific enhancements.

The paper tackles the problem of poor performance of large language model-based web agents in real-world navigation tasks by developing AutoWebGLM, which outperforms GPT-4 on benchmarks like AutoWebBench.

Large language models (LLMs) have fueled many intelligent web agents, but most existing ones perform far from satisfying in real-world web navigation tasks due to three factors: (1) the complexity of HTML text data (2) versatility of actions on webpages, and (3) task difficulty due to the open-domain nature of the web. In light of these challenges, we develop the open AutoWebGLM based on ChatGLM3-6B. AutoWebGLM can serve as a powerful automated web navigation agent that outperform GPT-4. Inspired by human browsing patterns, we first design an HTML simplification algorithm to represent webpages with vital information preserved succinctly. We then employ a hybrid human-AI method to build web browsing data for curriculum training. Finally, we bootstrap the model by reinforcement learning and rejection sampling to further facilitate webpage comprehension, browser operations, and efficient task decomposition by itself. For comprehensive evaluation, we establish a bilingual benchmark -- AutoWebBench -- for real-world web navigation tasks. We evaluate AutoWebGLM across diverse web navigation benchmarks, demonstrating its potential to tackle challenging tasks in real environments. Related code, model, and data are released at \url{https://github.com/THUDM/AutoWebGLM}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes