Identifying User Goals from UI Trajectories
This work addresses the problem of personalization in UI environments for applications like agents and analytics, but it is incremental as it builds on existing datasets and tasks.
The paper tackles the problem of identifying user goals from UI trajectories, proposing a new task and evaluation methodology, and finds that state-of-the-art models like GPT-4 and Gemini-1.5 Pro underperform compared to humans, highlighting the challenge and room for improvement.
Identifying underlying user goals and intents has been recognized as valuable in various personalization-oriented settings, such as personalized agents, improved search responses, advertising, user analytics, and more. In this paper, we propose a new task goal identification from observed UI trajectories aiming to infer the user's detailed intentions when performing a task within UI environments. To support this task, we also introduce a novel evaluation methodology designed to assess whether two intent descriptions can be considered paraphrases within a specific UI environment. Furthermore, we demonstrate how this task can leverage datasets designed for the inverse problem of UI automation, utilizing Android and web datasets for our experiments. To benchmark this task, we compare the performance of humans and state-of-the-art models, specifically GPT-4 and Gemini-1.5 Pro, using our proposed metric. The results reveal that both Gemini and GPT underperform relative to human performance, underscoring the challenge of the proposed task and the significant room for improvement. This work highlights the importance of goal identification within UI trajectories, providing a foundation for further exploration and advancement in this area.