Is HippoRAG 2 superseded?

HippoRAG 2 (Agent / long-term memory): superseded — cited as a baseline and beaten by newer methods. 1 paper(s) critique it, 5 beat it on benchmarks — #7 of 63 most-superseded. Sub-problem: cluster led by RAPTOR. Newer alternatives in the same sub-problem include HingeMem, AtomMem, REMem, Generative Semantic Workspace (GSW), CAM.

Method Drift›Agent / long-term memory

Superseded baseline#7 of 63 most-superseded

HippoRAG 2

From RAG to Memory: Non-Parametric Continual Learning for Large Language Models

Agent / long-term memory · first seen Feb 20, 2025

superseded — cited as a baseline and beaten by newer methods

1 papers critique it · 5 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites HippoRAG 2 as a baseline.

“However, these methods typically define cross-session relationships as simple clusters without modeling the nature of relationships or temporal evolution.”
— Pre-Storage Reasoning for Episodic Memory: Shifting Inference Burden to Memory for Personalized Dialogue

Beaten on benchmarks

Head-to-head results where a newer method reports beating HippoRAG 2. Values are copied from the source paper's tables — verify against the cited paper.

GSW beats HippoRAG 2 · Precision [Overall Precision]
0.865 vs 0.812
Beyond Fact Retrieval: Episodic Memory for RAG with Generative Semantic Workspaces
GSW beats HippoRAG 2 · Recall [Overall Recall]
0.894 vs 0.787
Beyond Fact Retrieval: Episodic Memory for RAG with Generative Semantic Workspaces
GSW beats HippoRAG 2 · F1 [Overall F1]
0.850 vs 0.753
Beyond Fact Retrieval: Episodic Memory for RAG with Generative Semantic Workspaces
REMem-I beats HippoRAG 2 · F1 [Complex-TR]
83.3 vs 78.2
REMem: Reasoning with Episodic Memory in Language Agent
REMem-I beats HippoRAG 2 · BLEU-1 [Complex-TR]
77.6 vs 72.7
REMem: Reasoning with Episodic Memory in Language Agent
REMem-I beats HippoRAG 2 · EM [Test of Time]
93.1 vs 66.9
REMem: Reasoning with Episodic Memory in Language Agent
REMem-I beats HippoRAG 2 · F1 [LoCoMo]
42.4 vs 39.0
REMem: Reasoning with Episodic Memory in Language Agent
REMem-I beats HippoRAG 2 · BLEU-1 [LoCoMo]
32.7 vs 30.8
REMem: Reasoning with Episodic Memory in Language Agent
REMem-I beats HippoRAG 2 · F1 [REALTALK]
25.6 vs 21.9
REMem: Reasoning with Episodic Memory in Language Agent
REMem-I beats HippoRAG 2 · BLEU-1 [REALTALK]
18.1 vs 16.2
REMem: Reasoning with Episodic Memory in Language Agent
MemGAS beats HippoRAG 2 · GPT4o-as-Judge [LongMemEval-s]
60.20 vs 57.60
From Single to Multi-Granularity: Toward Long-Term Memory Association and Selection of Conversational Agents
PREMem beats HippoRAG 2 · LLM-as-a-judge score [Qwen2.5-14B on LongMemEval]
64.73 vs 44.69
Pre-Storage Reasoning for Episodic Memory: Shifting Inference Burden to Memory for Personalized Dialogue

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.