AIMay 12

Executable Agentic Memory for GUI Agent

Zerui Qin, Sheng Yue, Xingyuan Hua, Yongjian Fu, Ju Ren

arXiv:2605.1229489.1

AI Analysis

This work addresses the fragility and inefficiency of LLM-based GUI agents in long-horizon tasks for automation systems.

EAM introduces a structured knowledge graph for GUI agents that replaces step-wise LLM planning with retrieval-and-execution, achieving up to 19.6% improvement over UI-TARS-7B on AndroidWorld while reducing token costs by 6x and maintaining 2.8s average latency.

Modern GUI agents typically rely on a model-centric and step-wise interaction paradigm, where LLMs must re-interpret the UI and re-decide actions at every screen, which is fragile in long-horizon tasks. In this paper, we propose Executable Agentic Memory (EAM), a structured Knowledge Graph (KG) that shifts GUI planning from free-form generation to a robust retrieval-and-execution process. Our approach includes a sample-efficient memory construction pipeline using state-aware DFS and action-group mining to compress multi-step routines. To ensure efficient planning, we introduce a value-guided graph search where a lightweight Q-function model steers Monte Carlo Tree Search (MCTS) over the KG. We theoretically establish bias-consistency for the Q-model and derive sample complexity bounds for path recovery. Empirically, EAM outperforms state-of-the-art baselines like UI-TARS-7B by up to $19.6\%$ on AndroidWorld, while reducing token costs $6\times$ relative to GPT-4o. With a $2.8$s average latency, EAM enables reliable, quick, and long-horizon GUI automation.

View on arXiv PDF

Similar