AIDec 11, 2025

AgentProg: Empowering Long-Horizon GUI Agents with Program-Guided Context Management

arXiv:2512.10371v15 citationsh-index: 15Has Code
Originality Highly original
AI Analysis

This addresses a critical bottleneck in mobile GUI agent automation for researchers and developers by improving efficiency and performance in long-horizon tasks.

The paper tackles the problem of context overhead in long-horizon GUI agents by proposing AgentProg, a program-guided context management approach that reframes interaction history as a program, achieving state-of-the-art success rates on benchmarks like AndroidWorld and maintaining robust performance where baselines degrade.

The rapid development of mobile GUI agents has stimulated growing research interest in long-horizon task automation. However, building agents for these tasks faces a critical bottleneck: the reliance on ever-expanding interaction history incurs substantial context overhead. Existing context management and compression techniques often fail to preserve vital semantic information, leading to degraded task performance. We propose AgentProg, a program-guided approach for agent context management that reframes the interaction history as a program with variables and control flow. By organizing information according to the structure of program, this structure provides a principled mechanism to determine which information should be retained and which can be discarded. We further integrate a global belief state mechanism inspired by Belief MDP framework to handle partial observability and adapt to unexpected environmental changes. Experiments on AndroidWorld and our extended long-horizon task suite demonstrate that AgentProg has achieved the state-of-the-art success rates on these benchmarks. More importantly, it maintains robust performance on long-horizon tasks while baseline methods experience catastrophic degradation. Our system is open-sourced at https://github.com/MobileLLM/AgentProg.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes