Personalize-then-Store: Benchmarking and Learning Personalized Memory for Long-horizon Agents
For developers of long-horizon LLM agents, this work highlights the need for personalized memory policies and provides a benchmark to drive progress, though the proposed solution's effectiveness is limited by the gating accuracy challenge.
The paper identifies that existing LLM-based memory systems use universal, static policies that fail to personalize what to store for different users, wasting memory on transient interactions. It introduces PerMemBench, the first benchmark for personalized memory, and proposes session-level storage gating, showing that perfect personalization yields substantial retention gains but accurate gating remains an open challenge.
Existing large language model (LLM) based memory systems apply universal, static policies that overlook a fundamental reality: the contexts that are worth storing in memory are different across users. This misalignment wastes limited memory budget on transient interactions while failing to preserve critical context for long horizon tasks. To address this gap, we investigate an underexplored question: can LLM based memory systems learn personalized memory policies? We introduce PerMemBench, the first benchmark for evaluating personalized memory systems, featuring multi year, multi domain interaction histories across diverse user personas. We further present the first empirical study of memory personalization, proposing session level storage gating, a lightweight framework that selectively bypasses memory operations for transient sessions. Our study confirms that personalization yields substantial retention gains under perfect gating, yet reveals that accurate gating remains an open and critical challenge.