CLJun 4

EMBER: Efficient Memory via Budgeted Evidence Retention for Long-Horizon Agents

arXiv:2606.0589448.3
AI Analysis

For developers of long-horizon AI agents, EMBER addresses the problem of efficient memory management by showing that learned evidence retention outperforms rereading larger histories.

EMBERT introduces a learned retention policy for long-horizon agents that selects which source evidence to keep under a fixed token budget before queries are known. On LongMemEval-RR, EMBER-14B achieves 0.3017 F1 at 8192 tokens, outperforming the strongest baseline (0.1765 F1).

Long-horizon agents can archive large histories, but future answers still incur retrieval, rereading, and context costs. When retained memory misses answer-relevant evidence, the system must return to larger portions of the raw history. We study budgeted evidence survival: before the query is known, which source evidence should be retained so that it remains recoverable and usable under a fixed retained source-evidence token budget? We instantiate this setting as Budgeted Pre-Query Retention, where memory is written during ingestion and later read without access to the full raw stream. We introduce EMBER, a learned retention policy that constructs a compact, source-backed evidence state. EMBER stores evidence capsules: verbatim source excerpts paired with retrieval keys and update metadata, preserving both grounding and read-time access. Post-query outcome feedback trains the writer to preserve evidence across the ingestion-retrieval-answer chain. On LongMemEval-RR, our LongMemEval-derived retained-evidence protocol, EMBER-14B reaches 0.3017 F1 at the 8192-token retained-evidence comparison point, compared with 0.1765 for the strongest non-EMBER budgeted baseline. Across retained source-evidence budgets, EMBER improves F1, Retain-Recall, and Read-Recall, indicating that long-horizon memory depends on retaining evidence within the budget rather than rereading larger histories.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes