CLFeb 2

GuideWeb: A Benchmark for Automatic In-App Guide Generation on Real-World Web UIs

arXiv:2602.01917v11 citationsh-index: 7
Originality Incremental advance
AI Analysis

This addresses the labor-intensive task of maintaining web-based guidance for users of complex websites, but it is incremental as it introduces a benchmark and agent with specific gains.

The paper tackles the problem of automatically generating in-app guides for web UIs to reduce manual maintenance in Digital Adoption Platforms, achieving 30.79% accuracy in guide target element prediction and BLEU scores of 44.94 for intent generation and 21.34 for guide-text generation.

Digital Adoption Platform (DAP) provide web-based overlays that deliver operation guidance and contextual hints to help users navigate complex websites. Although modern DAP tools enable non-experts to author such guidance, maintaining these guides remains labor-intensive because website layouts and functionalities evolve continuously, which requires repeated manual updates and re-annotation. In this work, we introduce \textbf{GuideWeb}, a new benchmark for automatic in-app guide generation on real-world web UIs. GuideWeb formulates the task as producing page-level guidance by selecting \textbf{guide target elements} grounded in the webpage and generating concise guide text aligned with user intent. We also propose a comprehensive evaluation suite that jointly measures the accuracy of guide target element selection and the quality of generated intents and guide texts. Experiments show that our proposed \textbf{GuideWeb Agent} achieves \textbf{30.79\%} accuracy in guide target element prediction, while obtaining BLEU scores of \textbf{44.94} for intent generation and \textbf{21.34} for guide-text generation. Existing baselines perform substantially worse, which highlights that automatic guide generation remains challenging and that further advances are necessary before such systems can be reliably deployed in real-world settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes