A-MEM (Agent / long-term memory): heavily superseded — a standard baseline that newer methods routinely beat. 5 paper(s) critique it, 12 beat it on benchmarks — #2 of 63 most-superseded. Sub-problem: cluster led by Mem0. Newer alternatives in the same sub-problem include REAL, MemPro, MemGuard, DeferMem, Memory-R2.

Heavily superseded#2 of 63 most-superseded

A-MEM

A-MEM: Agentic Memory for LLM Agents

Agent / long-term memory · first seen Feb 17, 2025

heavily superseded — a standard baseline that newer methods routinely beat

5 papers critique it · 12 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites A-MEM as a baseline.

“However, most existing agent-memory approaches still rely on unweighted or weakly weighted relations, where an edge primarily indicates the existence of a connection rather than its query-dependent utility.”
— HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution
“Current frameworks operate as synchronous, ``append-and-evolve-all'' systems. Every user utterance---regardless of its information density---is forced through the entire memory construction and evolution pipeline. In production, this design inevitably leads to an $O(N^2)$ computational complexity for memory updates as the interaction history grows.”
— D-MEM: Dopamine-Gated Agentic Memory via Reward Prediction Error Routing
“trigger updates based on arbitrary token counts or time-steps rather than semantic completeness, failing to prevent the corruption of stable knowledge by transient dialogue states.”
— GAM: Hierarchical Graph-based Agentic Memory for LLM Agents
“However, they rely on implicit, unstructured associations rather than explicit schemas for modeling information evolution across sessions. This approach can lead to arbitrary links and inconsistent interpretations that are difficult to analyze.”
— Pre-Storage Reasoning for Episodic Memory: Shifting Inference Burden to Memory for Personalized Dialogue
“However, prior work typically organizes memory around associative proximity (e.g., semantic similarity) rather than mechanistic dependency~kiciman2023causal. As a result, such methods can retrieve what occurred but struggle to reason about why, since they lack explicit representations of causal structure”
— MAGMA: A Multi-Graph based Agentic Memory Architecture for AI Agents

Beaten on benchmarks

Head-to-head results where a newer method reports beating A-MEM. Values are copied from the source paper's tables — verify against the cited paper.

HAGE beats A-MEM · Overall [gpt-4o-mini]
0.739 vs 0.580
HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution
HAGE beats A-MEM · Overall [Qwen2.5-3B]
0.548 vs 0.410
HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution
HAGE beats A-MEM · LLM Score [GPT-4o-mini]
0.824 vs 0.547
HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution
HAGE beats A-MEM · F1 [GPT-4o-mini]
0.678 vs 0.433
HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution
HAGE beats A-MEM · LLM Score [Qwen2.5-3B]
0.527 vs 0.416
HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution
HAGE beats A-MEM · F1 [Qwen2.5-3B]
0.429 vs 0.186
HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution
HAGE beats A-MEM · Avg. Score [all]
0.739 vs 0.580
HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution
D-MEM beats A-MEM · Overall F1 [clean LoCoMo benchmark]
37.4 vs 35.94
D-MEM: Dopamine-Gated Agentic Memory via Reward Prediction Error Routing
D-MEM beats A-MEM · Multi Hop F1 [clean LoCoMo benchmark]
42.7 vs 27.0
D-MEM: Dopamine-Gated Agentic Memory via Reward Prediction Error Routing
D-MEM beats A-MEM · Overall F1 [extreme noise ρ=0.75]
0.369 vs 0.336
D-MEM: Dopamine-Gated Agentic Memory via Reward Prediction Error Routing
MemGuard beats A-MEM · Avg. [LoCoMo (Base LLM: GPT-4.1-mini, Judge LLM: GPT-4.1)]
77.29 vs 59.62
MemGuard: Preventing Memory Contamination in Long-Term Memory-Augmented Large Language Models
GRAVITY beats A-MEM · LLM-judge accuracy [LongMemEval Micro]
63.2 vs 53.8
GRAVITY: Architecture-Agnostic Structured Anchoring for Long-Horizon Conversational Memory

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.