CL AIFeb 5, 2025

ReachAgent: Enhancing Mobile Agent via Page Reaching and Operation

Qinzhuo Wu, Wei Liu, Jian Luan, Bin Wang

arXiv:2502.02955v117 citationsh-index: 10NAACL

Originality Incremental advance

AI Analysis

This addresses inefficiencies in mobile AI agents for task completion, offering incremental improvements in GUI interaction performance.

The paper tackles the problem of mobile AI agents achieving local optimal solutions by ignoring overall GUI flow, proposing ReachAgent with a two-stage framework and MobileReach dataset, resulting in improvements of up to 7.69% in accuracy metrics compared to state-of-the-art agents.

Recently, mobile AI agents have gained increasing attention. Given a task, mobile AI agents can interact with mobile devices in multiple steps and finally form a GUI flow that solves the task. However, existing agents tend to focus on most task-relevant elements at each step, leading to local optimal solutions and ignoring the overall GUI flow. To address this issue, we constructed a training dataset called MobileReach, which breaks the task into page reaching and operation subtasks. Furthermore, we propose ReachAgent, a two-stage framework that focuses on improving its task-completion abilities. It utilizes the page reaching and page operation subtasks, along with reward-based preference GUI flows, to further enhance the agent. Experimental results show that ReachAgent significantly improves the IoU Acc and Text Acc by 7.12% and 7.69% on the step-level and 4.72% and 4.63% on the task-level compared to the SOTA agent. Our data and code will be released upon acceptance.

View on arXiv PDF

Similar