CLAIFeb 22

Anatomy of Agentic Memory: Taxonomy and Empirical Analysis of Evaluation and System Limitations

arXiv:2602.19320v18 citationsh-index: 3
Originality Synthesis-oriented
AI Analysis

It addresses the problem of unreliable evaluation and system inefficiencies in agentic memory for AI researchers and developers, but it is incremental as it synthesizes existing knowledge rather than introducing new methods.

This survey analyzes agentic memory systems for LLM agents, identifying that current benchmarks are underscaled and metrics misaligned, leading to performance variability and overlooked costs, and it provides a taxonomy and empirical limitations to guide better evaluation and design.

Agentic memory systems enable large language model (LLM) agents to maintain state across long interactions, supporting long-horizon reasoning and personalization beyond fixed context windows. Despite rapid architectural development, the empirical foundations of these systems remain fragile: existing benchmarks are often underscaled, evaluation metrics are misaligned with semantic utility, performance varies significantly across backbone models, and system-level costs are frequently overlooked. This survey presents a structured analysis of agentic memory from both architectural and system perspectives. We first introduce a concise taxonomy of MAG systems based on four memory structures. Then, we analyze key pain points limiting current systems, including benchmark saturation effects, metric validity and judge sensitivity, backbone-dependent accuracy, and the latency and throughput overhead introduced by memory maintenance. By connecting the memory structure to empirical limitations, this survey clarifies why current agentic memory systems often underperform their theoretical promise and outlines directions for more reliable evaluation and scalable system design.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes