EMBER: Efficient Memory via Budgeted Evidence Retention for Long-Horizon Agents
For developers of long-horizon AI agents, EMBER addresses the problem of efficient memory management by showing that learned evidence retention outperforms rereading larger histories.
EMBERT introduces a learned retention policy for long-horizon agents that selects which source evidence to keep under a fixed token budget before queries are known. On LongMemEval-RR, EMBER-14B achieves 0.3017 F1 at 8192 tokens, outperforming the strongest baseline (0.1765 F1).
Long-horizon agents can archive large histories, but future answers still incur retrieval, rereading, and context costs. When retained memory misses answer-relevant evidence, the system must return to larger portions of the raw history. We study budgeted evidence survival: before the query is known, which source evidence should be retained so that it remains recoverable and usable under a fixed retained source-evidence token budget? We instantiate this setting as Budgeted Pre-Query Retention, where memory is written during ingestion and later read without access to the full raw stream. We introduce EMBER, a learned retention policy that constructs a compact, source-backed evidence state. EMBER stores evidence capsules: verbatim source excerpts paired with retrieval keys and update metadata, preserving both grounding and read-time access. Post-query outcome feedback trains the writer to preserve evidence across the ingestion-retrieval-answer chain. On LongMemEval-RR, our LongMemEval-derived retained-evidence protocol, EMBER-14B reaches 0.3017 F1 at the 8192-token retained-evidence comparison point, compared with 0.1765 for the strongest non-EMBER budgeted baseline. Across retained source-evidence budgets, EMBER improves F1, Retain-Recall, and Read-Recall, indicating that long-horizon memory depends on retaining evidence within the budget rather than rereading larger histories.