Method Drift›Agent / long-term memory
MemGPT
MemGPT: Towards LLMs as Operating SystemsAgent / long-term memory · first seen Oct 12, 2023
superseded — cited as a baseline and beaten by newer methods
4 papers critique it · 3 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites MemGPT as a baseline.
“However, these approaches face significant limitations in handling diverse real-world tasks. While they can provide basic memory functionality, their operations are typically constrained by predefined structures and fixed workflows. These constraints stem from their reliance on rigid operational patterns, particularly in memory writing and retrieval processes. Such inflexibility leads to poor generalization in new environments and limited effectiveness in long-term interactions.”
— A-MEM: Agentic Memory for LLM Agents“they operate in an ``open loop'' without feedback on whether the constructed memories benefit downstream tasks”
— MemBuilder: Reinforcing LLMs for Long-Term Memory Construction via Attributed Dense Rewards“While simple to implement, these unstructured memory designs fall short when critical information is dispersed across multiple entries.”
— CAM: A Constructivist View of Agentic Memory for LLM-Based Reading Comprehension“Early memory mechanisms in LLM-based agents typically relied on heuristic-based static workflows.”
— AtomMem : Learnable Dynamic Agentic Memory with Atomic Memory Operation
Beaten on benchmarks
Head-to-head results where a newer method reports beating MemGPT. Values are copied from the source paper's tables — verify against the cited paper.
- Mnemosyne: An Unsupervised, Human-Inspired Long-Term Memory Architecture for Edge-Based LLMs
Mnemosyne beats MemGPT · Overall (%) [LoCoMo benchmark]
54.55 vs 44.18
- A-MEM: Agentic Memory for LLM Agents
A-MEM beats MemGPT · Multi Hop F1 [GPT 4o-mini]
27.02 vs 26.65
- A-MEM: Agentic Memory for LLM Agents
A-MEM beats MemGPT · Multi Hop BLEU [GPT 4o-mini]
20.09 vs 17.72
- A-MEM: Agentic Memory for LLM Agents
A-MEM beats MemGPT · Temporal F1 [GPT 4o-mini]
45.85 vs 25.52
- A-MEM: Agentic Memory for LLM Agents
A-MEM beats MemGPT · Temporal BLEU [GPT 4o-mini]
36.67 vs 19.44
- A-MEM: Agentic Memory for LLM Agents
A-MEM beats MemGPT · Open Domain F1 [GPT 4o-mini]
12.14 vs 9.15
- A-MEM: Agentic Memory for LLM Agents
A-MEM beats MemGPT · Open Domain BLEU [GPT 4o-mini]
12.00 vs 7.44
- A-MEM: Agentic Memory for LLM Agents
A-MEM beats MemGPT · Single Hop F1 [GPT 4o-mini]
44.65 vs 41.04
- A-MEM: Agentic Memory for LLM Agents
A-MEM beats MemGPT · Single Hop BLEU [GPT 4o-mini]
37.06 vs 34.34
- A-MEM: Agentic Memory for LLM Agents
A-MEM beats MemGPT · Multi Hop F1 [GPT 4o]
32.86 vs 30.36
- A-MEM: Agentic Memory for LLM Agents
A-MEM beats MemGPT · Multi Hop BLEU [GPT 4o]
23.76 vs 22.83
- A-MEM: Agentic Memory for LLM Agents
A-MEM beats MemGPT · Temporal F1 [GPT 4o]
39.41 vs 17.29
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- HingeMemHingeMem: Boundary Guided Long-Term Memory with Query Adaptive Retrieval for Scalable DialoguesApr 8, 2026
- Jan 13, 2026
- Nov 25, 2025
- Generative Semantic Workspace (GSW)Beyond Fact Retrieval: Episodic Memory for RAG with Generative Semantic WorkspacesNov 10, 2025
- Oct 7, 2025
- PREMemPre-Storage Reasoning for Episodic Memory: Shifting Inference Burden to Memory for Personalized DialogueSep 13, 2025