Is RAPTOR superseded?

RAPTOR (Agent / long-term memory): heavily superseded — a standard baseline that newer methods routinely beat. 3 paper(s) critique it, 4 beat it on benchmarks — #5 of 63 most-superseded. Sub-problem: cluster led by RAPTOR. Newer alternatives in the same sub-problem include HingeMem, AtomMem, REMem, Generative Semantic Workspace (GSW), CAM.

Method Drift›Agent / long-term memory

Heavily superseded#5 of 63 most-superseded

RAPTOR

RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

Agent / long-term memory · first seen Jan 31, 2024

heavily superseded — a standard baseline that newer methods routinely beat

3 papers critique it · 4 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites RAPTOR as a baseline.

“Existing approaches can be viewed as constructing a semantic abstraction tree where leaf nodes represent low-level text units and internal nodes summarize or route over their children. They differ primarily in how this hierarchy is formed. RAPTOR and LATTICE build semantic hierarchies through clustering.”
— Temporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents
“Although these works effectively structure large-scale textual data to enhance retrieval and generation capabilities, they are confined to static corpora, requiring complete reconstruction to integrate new information.”
— CAM: A Constructivist View of Agentic Memory for LLM-Based Reading Comprehension
“RAPTOR's performance deteriorates substantially on the simple and multi-hop QA tasks due to the noise introduced into the retrieval corpora by its LLM summarization mechanism.”
— From RAG to Memory: Non-Parametric Continual Learning for Large Language Models

Beaten on benchmarks

Head-to-head results where a newer method reports beating RAPTOR. Values are copied from the source paper's tables — verify against the cited paper.

MEM² beats RAPTOR · LLM-judge accuracy [LoCoMo]
0.639 vs 0.593
Temporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents
MEM² beats RAPTOR · F1 score [LoCoMo]
0.301 vs 0.223
Temporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents
MEM² beats RAPTOR · LLM-judge accuracy [LongMemEval-MAB]
0.647 vs 0.523
Temporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents
MEM² beats RAPTOR · F1 score [LongMemEval-MAB]
0.197 vs 0.166
Temporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents
MEM² beats RAPTOR · LLM-judge accuracy [RealMem]
0.621 vs 0.579
Temporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents
MEM² beats RAPTOR · F1 score [RealMem]
0.356 vs 0.312
Temporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents
HippoRAG beats RAPTOR · R@2 [MuSiQue dataset]
40.9 vs 35.7
HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models
HippoRAG beats RAPTOR · R@5 [MuSiQue dataset]
51.9 vs 45.3
HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models
HippoRAG beats RAPTOR · R@2 [2Wiki dataset]
70.7 vs 46.3
HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models
HippoRAG beats RAPTOR · R@5 [2Wiki dataset]
89.1 vs 53.8
HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models
HippoRAG beats RAPTOR · R@2 [HotpotQA dataset]
60.5 vs 58.1
HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models
HippoRAG beats RAPTOR · R@5 [HotpotQA dataset]
77.7 vs 71.2
HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.