Mem-T: Densifying Rewards for Long-Horizon Memory Agents
This addresses the training inefficiency for memory agents in AI, enabling better optimization of memory management policies, though it is incremental as it builds on existing memory agent frameworks.
The paper tackled the problem of sparse and delayed rewards in training long-horizon memory agents by introducing Mem-T, an autonomous memory agent with a hierarchical memory database, and MoT-GRPO, a reinforcement learning framework that densifies rewards, resulting in performance improvements of up to 14.92% and a reduction in inference tokens per query by ~24.45%.
Memory agents, which depart from predefined memory-processing pipelines by endogenously managing the processing, storage, and retrieval of memories, have garnered increasing attention for their autonomy and adaptability. However, existing training paradigms remain constrained: agents often traverse long-horizon sequences of memory operations before receiving sparse and delayed rewards, which hinders truly end-to-end optimization of memory management policies. To address this limitation, we introduce Mem-T, an autonomous memory agent that interfaces with a lightweight hierarchical memory database to perform dynamic updates and multi-turn retrieval over streaming inputs. To effectively train long-horizon memory management capabilities, we further propose MoT-GRPO, a tree-guided reinforcement learning framework that transforms sparse terminal feedback into dense, step-wise supervision via memory operation tree backpropagation and hindsight credit assignment, thereby enabling the joint optimization of memory construction and retrieval. Extensive experiments demonstrate that Mem-T is (1) high-performing, surpassing frameworks such as A-Mem and Mem0 by up to $14.92\%$, and (2) economical, operating on a favorable accuracy-efficiency Pareto frontier and reducing inference tokens per query by $\sim24.45\%$ relative to GAM without sacrificing performance.