Zep (Agent / long-term memory): heavily superseded — a standard baseline that newer methods routinely beat. 1 paper(s) critique it, 9 beat it on benchmarks — #3 of 63 most-superseded. Sub-problem: cluster led by Mem0. Newer alternatives in the same sub-problem include REAL, MemPro, MemGuard, DeferMem, Memory-R2.

Heavily superseded#3 of 63 most-superseded

Zep

Zep: A Temporal Knowledge Graph Architecture for Agent Memory

Agent / long-term memory · first seen Jan 20, 2025

heavily superseded — a standard baseline that newer methods routinely beat

1 papers critique it · 9 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites Zep as a baseline.

“LLMs cannot use memory tools effectively and using such tools increases redundancy in planning and overall tool use.”
— MEMTRACK: Evaluating Long-Term Memory and State Tracking in Multi-Platform Dynamic Agent Environments

Beaten on benchmarks

Head-to-head results where a newer method reports beating Zep. Values are copied from the source paper's tables — verify against the cited paper.

Mnemosyne beats Zep · Overall (%) [LoCoMo benchmark]
54.55 vs 42.80
Mnemosyne: An Unsupervised, Human-Inspired Long-Term Memory Architecture for Edge-Based LLMs
SwiftMem beats Zep · BLEU-1 [Temporal Reasoning]
0.569 vs 0.200
SwiftMem: Fast Agentic Memory via Query-aware Indexing
SwiftMem beats Zep · BLEU-1 [Overall]
0.467 vs 0.309
SwiftMem: Fast Agentic Memory via Query-aware Indexing
SwiftMem beats Zep · Search latency [Overall]
11 vs 522
SwiftMem: Fast Agentic Memory via Query-aware Indexing
GRAVITY beats Zep · LLM-judge accuracy [LongMemEval Micro]
60.4 vs 48.2
GRAVITY: Architecture-Agnostic Structured Anchoring for Long-Horizon Conversational Memory
GRAVITY beats Zep · LLM-judge accuracy [LongMemEval Macro]
60.9 vs 47.8
GRAVITY: Architecture-Agnostic Structured Anchoring for Long-Horizon Conversational Memory
GRAVITY beats Zep · LLM-judge accuracy [LoCoMo]
61.8 vs 54.7
GRAVITY: Architecture-Agnostic Structured Anchoring for Long-Horizon Conversational Memory
DeltaMem beats Zep · LJ [Overall]
75.13 vs 65.99
DeltaMem: Towards Agentic Memory Management via Reinforcement Learning
DeltaMem beats Zep · Overall [Overall]
63.61 vs 56.71
DeltaMem: Towards Agentic Memory Management via Reinforcement Learning
gpt-5+NoMem beats Zep · Efficiency [GPT-5]
0.667 vs 0.660
MEMTRACK: Evaluating Long-Term Memory and State Tracking in Multi-Platform Dynamic Agent Environments
gpt-5+NoMem beats Zep · Redundancy [GPT-5]
0.206 vs 0.214
MEMTRACK: Evaluating Long-Term Memory and State Tracking in Multi-Platform Dynamic Agent Environments
gemini-2.5-pro+NoMem beats Zep · Correctness [Gemini-2.5-pro]
0.144 vs 0.140
MEMTRACK: Evaluating Long-Term Memory and State Tracking in Multi-Platform Dynamic Agent Environments

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.