HCAIJan 27

GhostUI: Unveiling Hidden Interactions in Mobile UI

arXiv:2601.19258v11 citationsh-index: 7
Originality Synthesis-oriented
AI Analysis

This addresses a challenge in mobile task automation for developers and users, though it is incremental as it focuses on dataset creation rather than a new method.

The paper tackles the problem of hidden interactions in mobile UIs, which are difficult for users and mobile agents to detect, by introducing GhostUI, a dataset that improves vision language models' ability to recognize these interactions and predict post-interaction states, with fine-tuned models outperforming baselines in evaluations.

Modern mobile applications rely on hidden interactions--gestures without visual cues like long presses and swipes--to provide functionality without cluttering interfaces. While experienced users may discover these interactions through prior use or onboarding tutorials, their implicit nature makes them difficult for most users to uncover. Similarly, mobile agents--systems designed to automate tasks on mobile user interfaces, powered by vision language models (VLMs)--struggle to detect veiled interactions or determine actions for completing tasks. To address this challenge, we present GhostUI, a new dataset designed to enable the detection of hidden interactions in mobile applications. GhostUI provides before-and-after screenshots, simplified view hierarchies, gesture metadata, and task descriptions, allowing VLMs to better recognize concealed gestures and anticipate post-interaction states. Quantitative evaluations with VLMs show that models fine-tuned on GhostUI outperform baseline VLMs, particularly in predicting hidden interactions and inferring post-interaction screens, underscoring GhostUI's potential as a foundation for advancing mobile task automation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes