CLDec 14, 2024

WEPO: Web Element Preference Optimization for LLM-based Web Navigation

arXiv:2412.10742v19 citationsh-index: 2
Originality Incremental advance
AI Analysis

This addresses the challenge of aligning user intent with actions in autonomous web navigation, representing an incremental advance in the field.

The paper tackled the problem of improving LLM-based web navigation by leveraging HTML element redundancy for contrastive training, achieving state-of-the-art results with a 13.8% improvement over WebAgent and 5.3% over CogAgent on the Mind2Web benchmark.

The rapid advancement of autonomous web navigation has significantly benefited from grounding pretrained Large Language Models (LLMs) as agents. However, current research has yet to fully leverage the redundancy of HTML elements for contrastive training. This paper introduces a novel approach to LLM-based web navigation tasks, called Web Element Preference Optimization (WEPO). WEPO utilizes unsupervised preference learning by sampling distance-based non-salient web elements as negative samples, optimizing maximum likelihood objective within Direct Preference Optimization (DPO). We evaluate WEPO on the Mind2Web benchmark and empirically demonstrate that WEPO aligns user high-level intent with output actions more effectively. The results show that our method achieved the state-of-the-art, with an improvement of 13.8% over WebAgent and 5.3% over the visual language model CogAgent baseline. Our findings underscore the potential of preference optimization to enhance web navigation and other web page based tasks, suggesting a promising direction for future research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes