From Lossy to Verified: A Provenance-Aware Tiered Memory for Agents
This addresses inefficiencies and verification issues in memory management for AI agents, offering a domain-specific incremental improvement.
The paper tackles the problem of long-horizon agents compressing interaction histories into summaries, which can cause unverifiable omissions and inefficiencies when querying raw logs, by proposing TierMem, a provenance-linked framework that uses a two-tier memory hierarchy to reduce input tokens by 54.1% and latency by 60.7% while maintaining 0.851 accuracy compared to 0.873 raw-only.
Long-horizon agents often compress interaction histories into write-time summaries. This creates a fundamental write-before-query barrier: compression decisions are made before the system knows what a future query will hinge on. As a result, summaries can cause unverifiable omissions -- decisive constraints (e.g., allergies) may be dropped, leaving the agent unable to justify an answer with traceable evidence. Retaining raw logs restores an authoritative source of truth, but grounding on raw logs by default is expensive: many queries are answerable from summaries, yet raw grounding still requires processing far longer contexts, inflating token consumption and latency. We propose TierMem, a provenance-linked framework that casts retrieval as an inference-time evidence allocation problem. TierMem uses a two-tier memory hierarchy to answer with the cheapest sufficient evidence: it queries a fast summary index by default, and a runtime sufficiency router Escalates to an immutable raw-log store only when summary evidence is insufficient. TierMem then writes back verified findings as new summary units linked to their raw sources. On LoCoMo, TierMem achieves 0.851 accuracy (vs.0.873 raw-only) while reducing input tokens by 54.1\% and latency by 60.7%.