Method Drift›Agent / long-term memory
HippoRAG
HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language ModelsAgent / long-term memory · first seen May 23, 2024
superseded — cited as a baseline and beaten by newer methods
4 papers critique it · 2 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites HippoRAG as a baseline.
“Some recent approaches (e.g. RAPTOR sarthi2024raptor, GraphRAG GraphRAG, HippoRAG gutierrez2024hipporag, and MemTree memtree) recognize the importance of memory structurality, yet none simultaneously embodies the flexibility and dynamicity during memory structure development.”
— CAM: A Constructivist View of Agentic Memory for LLM-Based Reading Comprehension“Every index-based method (one that pre-builds a structured store such as a graph, summary notes, or multi-store cache) lags long context on at least one benchmark”
— Exploring Cross-Scenario Generality of Agentic Memory Systems: Diagnostics and a Strong Baseline“HippoRAG's performance drops most on large-scale discourse understanding due to its lack of query-based contextualization”
— From RAG to Memory: Non-Parametric Continual Learning for Large Language Models“it relies on a single unified index for the entire events with fixed Top-k retrieval”
— HingeMem: Boundary Guided Long-Term Memory with Query Adaptive Retrieval for Scalable Dialogues
Beaten on benchmarks
Head-to-head results where a newer method reports beating HippoRAG. Values are copied from the source paper's tables — verify against the cited paper.
- From RAG to Memory: Non-Parametric Continual Learning for Large Language Models
48.6 beats HippoRAG · F1 score on MuSiQue [Llama-3.3-70B-Instruct QA reader with NV-Embed-v2 retriever]
48.6 vs 35.1
- From RAG to Memory: Non-Parametric Continual Learning for Large Language Models
25.9 beats HippoRAG · F1 score on NarrativeQA [Llama-3.3-70B-Instruct QA reader with NV-Embed-v2 retriever]
25.9 vs 16.3
- From RAG to Memory: Non-Parametric Continual Learning for Large Language Models
74.7 beats HippoRAG · passage recall@5 on MuSiQue [Llama-3.3-70B-Instruct structure generation with NV-Embed-v2 retriever]
74.7 vs 53.2
- From RAG to Memory: Non-Parametric Continual Learning for Large Language Models
96.3 beats HippoRAG · passage recall@5 on HotpotQA [Llama-3.3-70B-Instruct structure generation with NV-Embed-v2 retriever]
96.3 vs 77.3
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- HingeMemHingeMem: Boundary Guided Long-Term Memory with Query Adaptive Retrieval for Scalable DialoguesApr 8, 2026
- Jan 13, 2026
- Nov 25, 2025
- Generative Semantic Workspace (GSW)Beyond Fact Retrieval: Episodic Memory for RAG with Generative Semantic WorkspacesNov 10, 2025
- Oct 7, 2025
- PREMemPre-Storage Reasoning for Episodic Memory: Shifting Inference Burden to Memory for Personalized DialogueSep 13, 2025