Mem0 (Agent / long-term memory): heavily superseded — a standard baseline that newer methods routinely beat. 3 paper(s) critique it, 15 beat it on benchmarks — #1 of 63 most-superseded. Sub-problem: cluster led by Mem0. Newer alternatives in the same sub-problem include REAL, MemPro, MemGuard, DeferMem, Memory-R2.

Heavily superseded#1 of 63 most-superseded

Mem0

Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

Agent / long-term memory · first seen Apr 28, 2025

heavily superseded — a standard baseline that newer methods routinely beat

3 papers critique it · 15 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites Mem0 as a baseline.

“While graph databases provide structured organization for memory systems, their reliance on predefined schemas and relationships fundamentally limits their adaptability.”
— A-MEM: Agentic Memory for LLM Agents
“LLMs cannot use memory tools effectively and using such tools increases redundancy in planning and overall tool use.”
— MEMTRACK: Evaluating Long-Term Memory and State Tracking in Multi-Platform Dynamic Agent Environments
“they operate in an ``open loop'' without feedback on whether the constructed memories benefit downstream tasks”
— MemBuilder: Reinforcing LLMs for Long-Term Memory Construction via Attributed Dense Rewards

Beaten on benchmarks

Head-to-head results where a newer method reports beating Mem0. Values are copied from the source paper's tables — verify against the cited paper.

Mnemosyne beats Mem0 · Overall (%) [LoCoMo benchmark]
54.55 vs 45.68
Mnemosyne: An Unsupervised, Human-Inspired Long-Term Memory Architecture for Edge-Based LLMs
SwiftMem beats Mem0 · BLEU-1 [Temporal Reasoning]
0.569 vs 0.332
SwiftMem: Fast Agentic Memory via Query-aware Indexing
SwiftMem beats Mem0 · BLEU-1 [Overall]
0.467 vs 0.365
SwiftMem: Fast Agentic Memory via Query-aware Indexing
SwiftMem beats Mem0 · Search latency [Overall]
11 vs 784
SwiftMem: Fast Agentic Memory via Query-aware Indexing
MemGuard beats Mem0 · Acc. (%) [HaluMem extraction]
89.53 vs 60.86
MemGuard: Preventing Memory Contamination in Long-Term Memory-Augmented Large Language Models
MemGuard beats Mem0 · C [HaluMem update]
70.79 vs 25.50
MemGuard: Preventing Memory Contamination in Long-Term Memory-Augmented Large Language Models
GRAVITY beats Mem0 · LLM-judge accuracy [LongMemEval Micro]
64.0 vs 52.6
GRAVITY: Architecture-Agnostic Structured Anchoring for Long-Horizon Conversational Memory
GRAVITY beats Mem0 · LLM-judge accuracy [LongMemEval Macro]
64.0 vs 51.7
GRAVITY: Architecture-Agnostic Structured Anchoring for Long-Horizon Conversational Memory
GRAVITY beats Mem0 · LLM-judge accuracy [LoCoMo]
59.1 vs 51.0
GRAVITY: Architecture-Agnostic Structured Anchoring for Long-Horizon Conversational Memory
DeltaMem beats Mem0 · LJ [Overall]
75.13 vs 57.86
DeltaMem: Towards Agentic Memory Management via Reinforcement Learning
DeltaMem beats Mem0 · QA [Question Answering]
66.43 vs 53.02
DeltaMem: Towards Agentic Memory Management via Reinforcement Learning
gpt-5+NoMem beats Mem0 · Efficiency [GPT-5]
0.667 vs 0.656
MEMTRACK: Evaluating Long-Term Memory and State Tracking in Multi-Platform Dynamic Agent Environments

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.