Is HippoRAG superseded?

HippoRAG (Agent / long-term memory): superseded — cited as a baseline and beaten by newer methods. 4 paper(s) critique it, 2 beat it on benchmarks — #9 of 63 most-superseded. Sub-problem: cluster led by RAPTOR. Newer alternatives in the same sub-problem include HingeMem, AtomMem, REMem, Generative Semantic Workspace (GSW), CAM.

Method Drift›Agent / long-term memory

Superseded baseline#9 of 63 most-superseded

HippoRAG

HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models

Agent / long-term memory · first seen May 23, 2024

superseded — cited as a baseline and beaten by newer methods

4 papers critique it · 2 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites HippoRAG as a baseline.

“Some recent approaches (e.g. RAPTOR sarthi2024raptor, GraphRAG GraphRAG, HippoRAG gutierrez2024hipporag, and MemTree memtree) recognize the importance of memory structurality, yet none simultaneously embodies the flexibility and dynamicity during memory structure development.”
— CAM: A Constructivist View of Agentic Memory for LLM-Based Reading Comprehension
“Every index-based method (one that pre-builds a structured store such as a graph, summary notes, or multi-store cache) lags long context on at least one benchmark”
— Exploring Cross-Scenario Generality of Agentic Memory Systems: Diagnostics and a Strong Baseline
“HippoRAG's performance drops most on large-scale discourse understanding due to its lack of query-based contextualization”
— From RAG to Memory: Non-Parametric Continual Learning for Large Language Models
“it relies on a single unified index for the entire events with fixed Top-k retrieval”
— HingeMem: Boundary Guided Long-Term Memory with Query Adaptive Retrieval for Scalable Dialogues

Beaten on benchmarks

Head-to-head results where a newer method reports beating HippoRAG. Values are copied from the source paper's tables — verify against the cited paper.

48.6 beats HippoRAG · F1 score on MuSiQue [Llama-3.3-70B-Instruct QA reader with NV-Embed-v2 retriever]
48.6 vs 35.1
From RAG to Memory: Non-Parametric Continual Learning for Large Language Models
25.9 beats HippoRAG · F1 score on NarrativeQA [Llama-3.3-70B-Instruct QA reader with NV-Embed-v2 retriever]
25.9 vs 16.3
From RAG to Memory: Non-Parametric Continual Learning for Large Language Models
74.7 beats HippoRAG · passage recall@5 on MuSiQue [Llama-3.3-70B-Instruct structure generation with NV-Embed-v2 retriever]
74.7 vs 53.2
From RAG to Memory: Non-Parametric Continual Learning for Large Language Models
96.3 beats HippoRAG · passage recall@5 on HotpotQA [Llama-3.3-70B-Instruct structure generation with NV-Embed-v2 retriever]
96.3 vs 77.3
From RAG to Memory: Non-Parametric Continual Learning for Large Language Models

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.