CVHCMar 31, 2025

Navi-plus: Managing Ambiguous GUI Navigation Tasks with Follow-up Questions

arXiv:2503.24180v26 citationsh-index: 8
Originality Incremental advance
AI Analysis

This addresses a specific bottleneck in GUI automation for users who omit key information, offering an incremental improvement over existing agent paradigms.

The paper tackles the problem of GUI automation agents failing due to incomplete user instructions by introducing a Self-Correction GUI Navigation task that allows agents to ask follow-up questions, resulting in full performance recovery for ambiguous tasks.

Graphical user interfaces (GUI) automation agents are emerging as powerful tools, enabling humans to accomplish increasingly complex tasks on smart devices. However, users often inadvertently omit key information when conveying tasks, which hinders agent performance in the current agent paradigm that does not support immediate user intervention. To address this issue, we introduce a $\textbf{Self-Correction GUI Navigation}$ task that incorporates interactive information completion capabilities within GUI agents. We developed the $\textbf{Navi-plus}$ dataset with GUI follow-up question-answer pairs, alongside a $\textbf{Dual-Stream Trajectory Evaluation}$ method to benchmark this new capability. Our results show that agents equipped with the ability to ask GUI follow-up questions can fully recover their performance when faced with ambiguous user tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes