Method Drift›Agent / long-term memory
A-MEM
A-MEM: Agentic Memory for LLM AgentsAgent / long-term memory · first seen Feb 17, 2025
heavily superseded — a standard baseline that newer methods routinely beat
5 papers critique it · 12 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites A-MEM as a baseline.
“However, most existing agent-memory approaches still rely on unweighted or weakly weighted relations, where an edge primarily indicates the existence of a connection rather than its query-dependent utility.”
— HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution“Current frameworks operate as synchronous, ``append-and-evolve-all'' systems. Every user utterance---regardless of its information density---is forced through the entire memory construction and evolution pipeline. In production, this design inevitably leads to an $O(N^2)$ computational complexity for memory updates as the interaction history grows.”
— D-MEM: Dopamine-Gated Agentic Memory via Reward Prediction Error Routing“trigger updates based on arbitrary token counts or time-steps rather than semantic completeness, failing to prevent the corruption of stable knowledge by transient dialogue states.”
— GAM: Hierarchical Graph-based Agentic Memory for LLM Agents“However, they rely on implicit, unstructured associations rather than explicit schemas for modeling information evolution across sessions. This approach can lead to arbitrary links and inconsistent interpretations that are difficult to analyze.”
— Pre-Storage Reasoning for Episodic Memory: Shifting Inference Burden to Memory for Personalized Dialogue“However, prior work typically organizes memory around associative proximity (e.g., semantic similarity) rather than mechanistic dependency~kiciman2023causal. As a result, such methods can retrieve what occurred but struggle to reason about why, since they lack explicit representations of causal structure”
— MAGMA: A Multi-Graph based Agentic Memory Architecture for AI Agents
Beaten on benchmarks
Head-to-head results where a newer method reports beating A-MEM. Values are copied from the source paper's tables — verify against the cited paper.
- HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution
HAGE beats A-MEM · Overall [gpt-4o-mini]
0.739 vs 0.580
- HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution
HAGE beats A-MEM · Overall [Qwen2.5-3B]
0.548 vs 0.410
- HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution
HAGE beats A-MEM · LLM Score [GPT-4o-mini]
0.824 vs 0.547
- HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution
HAGE beats A-MEM · F1 [GPT-4o-mini]
0.678 vs 0.433
- HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution
HAGE beats A-MEM · LLM Score [Qwen2.5-3B]
0.527 vs 0.416
- HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution
HAGE beats A-MEM · F1 [Qwen2.5-3B]
0.429 vs 0.186
- HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution
HAGE beats A-MEM · Avg. Score [all]
0.739 vs 0.580
- D-MEM: Dopamine-Gated Agentic Memory via Reward Prediction Error Routing
D-MEM beats A-MEM · Overall F1 [clean LoCoMo benchmark]
37.4 vs 35.94
- D-MEM: Dopamine-Gated Agentic Memory via Reward Prediction Error Routing
D-MEM beats A-MEM · Multi Hop F1 [clean LoCoMo benchmark]
42.7 vs 27.0
- D-MEM: Dopamine-Gated Agentic Memory via Reward Prediction Error Routing
D-MEM beats A-MEM · Overall F1 [extreme noise ρ=0.75]
0.369 vs 0.336
- MemGuard: Preventing Memory Contamination in Long-Term Memory-Augmented Large Language Models
MemGuard beats A-MEM · Avg. [LoCoMo (Base LLM: GPT-4.1-mini, Judge LLM: GPT-4.1)]
77.29 vs 59.62
- GRAVITY: Architecture-Agnostic Structured Anchoring for Long-Horizon Conversational Memory
GRAVITY beats A-MEM · LLM-judge accuracy [LongMemEval Micro]
63.2 vs 53.8
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- Jun 9, 2026
- May 30, 2026
- MemGuardMemGuard: Preventing Memory Contamination in Long-Term Memory-Augmented Large Language ModelsMay 27, 2026
- DeferMemDeferMem: Query-Time Evidence Distillation via Reinforcement Learning for Long-Term Memory QAMay 21, 2026
- May 20, 2026
- May 3, 2026
- Apr 23, 2026
- Apr 2, 2026
- ChronosChronos: Temporal-Aware Conversational Agents with Structured Event Retrieval for Long-Term MemoryMar 17, 2026
- Mar 15, 2026
- Jan 13, 2026
- Agentic Memory (AgeMem)Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model AgentsJan 5, 2026