LGCLMay 21, 2025

ReGUIDE: Data Efficient GUI Grounding via Spatial Reasoning and Search

arXiv:2505.15259v29 citationsh-index: 14Has Code
Originality Highly original
AI Analysis

This addresses the problem of data inefficiency in GUI grounding for developers of autonomous agents, offering a novel method that reduces reliance on large datasets.

The paper tackles the challenge of accurately localizing GUI elements for autonomous agents using MLLMs, proposing ReGUIDE to achieve data-efficient learning through self-generated reasoning and spatial criticism, resulting in significant performance gains with only 0.2% of training data compared to baselines.

Recent advances in Multimodal Large Language Models (MLLMs) have enabled autonomous agents to interact with computers via Graphical User Interfaces (GUIs), where accurately localizing the coordinates of interface elements (e.g., buttons) is often required for fine-grained actions. However, this remains significantly challenging, leading prior works to rely on large-scale web datasets to improve the grounding accuracy. In this work, we propose Reasoning Graphical User Interface Grounding for Data Efficiency (ReGUIDE), a novel and effective framework for web grounding that enables MLLMs to learn data efficiently through self-generated reasoning and spatial-aware criticism. More specifically, ReGUIDE learns to (i) self-generate a language reasoning process for the localization via online reinforcement learning, and (ii) criticize the prediction using spatial priors that enforce equivariance under input transformations. At inference time, ReGUIDE further boosts performance through a test-time scaling strategy, which combines spatial search with coordinate aggregation. Our experiments demonstrate that ReGUIDE significantly advances web grounding performance across multiple benchmarks, outperforming baselines with substantially fewer training data points (e.g., only 0.2% samples compared to the best open-sourced baselines).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes