Beyond Semantic Organization: Memory as Execution State Management for Long-Horizon Agents
For LLM agent developers, this addresses the bottleneck of execution-state management in long-horizon tasks, offering a practical improvement over semantic memory systems.
LLM-based agents struggle with long-horizon tasks due to cascading errors from fragmented execution states. MAGE, a hierarchical state tree manager, improves task success rate by 7.8–20.4 percentage points and reduces token consumption by 55.1% on MemoryArena.
LLM-based agents increasingly tackle long-horizon tasks with interdependent decisions, where each action reshapes future constraints and intermediate errors can cascade. Existing RAG and agent memory systems organize histories by semantic similarity, retrieving content-relevant entries at decision time. We argue that this design mismatches execution-state dependencies: it fragments decision trajectories and mixes valid and erroneous traces, hindering coherent state reconstruction and error isolation. We propose MAGE (Memory as Agent-Guided Exploration), an active execution-state manager that stores interactions in a hierarchical state tree. The agent derives its state from the active root-to-current path, combining subgoal summaries, recent traces, and hints from prior branches. Four coupled operations maintain the tree: Grow records new traces, Compress summarizes completed subgoals, Maintain validates summaries, and Revise restores a target boundary and resumes on a new branch. This design bounds context growth while preserving state integrity and isolating flawed segments from the active path. Experiments on MemoryArena show that MAGE improves the average task success rate by 7.8--20.4 pp over baselines, while reducing token consumption by 55.1%.