HCAIJul 29, 2025

MapAgent: Trajectory-Constructed Memory-Augmented Planning for Mobile Task Automation

arXiv:2507.21953v16 citationsh-index: 8Has Code
Originality Incremental advance
AI Analysis

This work addresses the problem of automating complex tasks on mobile devices for users and developers, representing an incremental improvement by enhancing existing LLM-based agents with memory and planning mechanisms.

The paper tackles the challenge of LLM-based agents lacking knowledge about real-life mobile applications, which leads to ineffective planning and hallucinations in complex tasks, by proposing MapAgent, a framework that uses trajectory-constructed memory to augment planning, achieving superior performance in real-world scenarios.

The recent advancement of autonomous agents powered by Large Language Models (LLMs) has demonstrated significant potential for automating tasks on mobile devices through graphical user interfaces (GUIs). Despite initial progress, these agents still face challenges when handling complex real-world tasks. These challenges arise from a lack of knowledge about real-life mobile applications in LLM-based agents, which may lead to ineffective task planning and even cause hallucinations. To address these challenges, we propose a novel LLM-based agent framework called MapAgent that leverages memory constructed from historical trajectories to augment current task planning. Specifically, we first propose a trajectory-based memory mechanism that transforms task execution trajectories into a reusable and structured page-memory database. Each page within a trajectory is extracted as a compact yet comprehensive snapshot, capturing both its UI layout and functional context. Secondly, we introduce a coarse-to-fine task planning approach that retrieves relevant pages from the memory database based on similarity and injects them into the LLM planner to compensate for potential deficiencies in understanding real-world app scenarios, thereby achieving more informed and context-aware task planning. Finally, planned tasks are transformed into executable actions through a task executor supported by a dual-LLM architecture, ensuring effective tracking of task progress. Experimental results in real-world scenarios demonstrate that MapAgent achieves superior performance to existing methods. The code will be open-sourced to support further research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes