Method Drift›Agent / long-term memory
RAPTOR
RAPTOR: Recursive Abstractive Processing for Tree-Organized RetrievalAgent / long-term memory · first seen Jan 31, 2024
heavily superseded — a standard baseline that newer methods routinely beat
3 papers critique it · 4 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites RAPTOR as a baseline.
“Existing approaches can be viewed as constructing a semantic abstraction tree where leaf nodes represent low-level text units and internal nodes summarize or route over their children. They differ primarily in how this hierarchy is formed. RAPTOR and LATTICE build semantic hierarchies through clustering.”
— Temporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents“Although these works effectively structure large-scale textual data to enhance retrieval and generation capabilities, they are confined to static corpora, requiring complete reconstruction to integrate new information.”
— CAM: A Constructivist View of Agentic Memory for LLM-Based Reading Comprehension“RAPTOR's performance deteriorates substantially on the simple and multi-hop QA tasks due to the noise introduced into the retrieval corpora by its LLM summarization mechanism.”
— From RAG to Memory: Non-Parametric Continual Learning for Large Language Models
Beaten on benchmarks
Head-to-head results where a newer method reports beating RAPTOR. Values are copied from the source paper's tables — verify against the cited paper.
- Temporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents
MEM² beats RAPTOR · LLM-judge accuracy [LoCoMo]
0.639 vs 0.593
- Temporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents
MEM² beats RAPTOR · F1 score [LoCoMo]
0.301 vs 0.223
- Temporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents
MEM² beats RAPTOR · LLM-judge accuracy [LongMemEval-MAB]
0.647 vs 0.523
- Temporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents
MEM² beats RAPTOR · F1 score [LongMemEval-MAB]
0.197 vs 0.166
- Temporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents
MEM² beats RAPTOR · LLM-judge accuracy [RealMem]
0.621 vs 0.579
- Temporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents
MEM² beats RAPTOR · F1 score [RealMem]
0.356 vs 0.312
- HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models
HippoRAG beats RAPTOR · R@2 [MuSiQue dataset]
40.9 vs 35.7
- HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models
HippoRAG beats RAPTOR · R@5 [MuSiQue dataset]
51.9 vs 45.3
- HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models
HippoRAG beats RAPTOR · R@2 [2Wiki dataset]
70.7 vs 46.3
- HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models
HippoRAG beats RAPTOR · R@5 [2Wiki dataset]
89.1 vs 53.8
- HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models
HippoRAG beats RAPTOR · R@2 [HotpotQA dataset]
60.5 vs 58.1
- HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models
HippoRAG beats RAPTOR · R@5 [HotpotQA dataset]
77.7 vs 71.2
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- HingeMemHingeMem: Boundary Guided Long-Term Memory with Query Adaptive Retrieval for Scalable DialoguesApr 8, 2026
- Jan 13, 2026
- Nov 25, 2025
- Generative Semantic Workspace (GSW)Beyond Fact Retrieval: Episodic Memory for RAG with Generative Semantic WorkspacesNov 10, 2025
- Oct 7, 2025
- PREMemPre-Storage Reasoning for Episodic Memory: Shifting Inference Burden to Memory for Personalized DialogueSep 13, 2025