Method Drift›Agent / long-term memory
Zep
Zep: A Temporal Knowledge Graph Architecture for Agent MemoryAgent / long-term memory · first seen Jan 20, 2025
heavily superseded — a standard baseline that newer methods routinely beat
1 papers critique it · 9 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites Zep as a baseline.
“LLMs cannot use memory tools effectively and using such tools increases redundancy in planning and overall tool use.”
— MEMTRACK: Evaluating Long-Term Memory and State Tracking in Multi-Platform Dynamic Agent Environments
Beaten on benchmarks
Head-to-head results where a newer method reports beating Zep. Values are copied from the source paper's tables — verify against the cited paper.
- Mnemosyne: An Unsupervised, Human-Inspired Long-Term Memory Architecture for Edge-Based LLMs
Mnemosyne beats Zep · Overall (%) [LoCoMo benchmark]
54.55 vs 42.80
- SwiftMem: Fast Agentic Memory via Query-aware Indexing
SwiftMem beats Zep · BLEU-1 [Temporal Reasoning]
0.569 vs 0.200
- SwiftMem: Fast Agentic Memory via Query-aware Indexing
SwiftMem beats Zep · BLEU-1 [Overall]
0.467 vs 0.309
- SwiftMem: Fast Agentic Memory via Query-aware Indexing
SwiftMem beats Zep · Search latency [Overall]
11 vs 522
- GRAVITY: Architecture-Agnostic Structured Anchoring for Long-Horizon Conversational Memory
GRAVITY beats Zep · LLM-judge accuracy [LongMemEval Micro]
60.4 vs 48.2
- GRAVITY: Architecture-Agnostic Structured Anchoring for Long-Horizon Conversational Memory
GRAVITY beats Zep · LLM-judge accuracy [LongMemEval Macro]
60.9 vs 47.8
- GRAVITY: Architecture-Agnostic Structured Anchoring for Long-Horizon Conversational Memory
GRAVITY beats Zep · LLM-judge accuracy [LoCoMo]
61.8 vs 54.7
- DeltaMem: Towards Agentic Memory Management via Reinforcement Learning
DeltaMem beats Zep · LJ [Overall]
75.13 vs 65.99
- DeltaMem: Towards Agentic Memory Management via Reinforcement Learning
DeltaMem beats Zep · Overall [Overall]
63.61 vs 56.71
- MEMTRACK: Evaluating Long-Term Memory and State Tracking in Multi-Platform Dynamic Agent Environments
gpt-5+NoMem beats Zep · Efficiency [GPT-5]
0.667 vs 0.660
- MEMTRACK: Evaluating Long-Term Memory and State Tracking in Multi-Platform Dynamic Agent Environments
gpt-5+NoMem beats Zep · Redundancy [GPT-5]
0.206 vs 0.214
- MEMTRACK: Evaluating Long-Term Memory and State Tracking in Multi-Platform Dynamic Agent Environments
gemini-2.5-pro+NoMem beats Zep · Correctness [Gemini-2.5-pro]
0.144 vs 0.140
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- Jun 9, 2026
- May 30, 2026
- MemGuardMemGuard: Preventing Memory Contamination in Long-Term Memory-Augmented Large Language ModelsMay 27, 2026
- DeferMemDeferMem: Query-Time Evidence Distillation via Reinforcement Learning for Long-Term Memory QAMay 21, 2026
- May 20, 2026
- May 3, 2026
- Apr 23, 2026
- Apr 2, 2026
- ChronosChronos: Temporal-Aware Conversational Agents with Structured Event Retrieval for Long-Term MemoryMar 17, 2026
- Mar 15, 2026
- Jan 13, 2026
- Agentic Memory (AgeMem)Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model AgentsJan 5, 2026