Method Drift›Agent / long-term memory
MIRIX
MIRIX: Multi-Agent Memory System for LLM-Based AgentsAgent / long-term memory · first seen Jul 10, 2025
superseded — cited as a baseline and beaten by newer methods
1 papers critique it · 2 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites MIRIX as a baseline.
“they operate in an ``open loop'' without feedback on whether the constructed memories benefit downstream tasks”
— MemBuilder: Reinforcing LLMs for Long-Term Memory Construction via Attributed Dense Rewards
Beaten on benchmarks
Head-to-head results where a newer method reports beating MIRIX. Values are copied from the source paper's tables — verify against the cited paper.
- MemBuilder: Reinforcing LLMs for Long-Term Memory Construction via Attributed Dense Rewards
MemBuilder beats MIRIX · accuracy [LoCoMo benchmark]
84.23 vs 77.48
- MemBuilder: Reinforcing LLMs for Long-Term Memory Construction via Attributed Dense Rewards
MemBuilder beats MIRIX · accuracy [LongMemEval benchmark]
85.75 vs 73.25
- MemBuilder: Reinforcing LLMs for Long-Term Memory Construction via Attributed Dense Rewards
MemBuilder beats MIRIX · accuracy [PerLTQA benchmark]
93.14 vs 83.11
- EviMem: Evidence-Gap-Driven Iterative Retrieval for Long-Term Conversational Memory
\system{} (Ours) beats MIRIX · G-EVAL [Overall]
2.81 vs 2.75
- EviMem: Evidence-Gap-Driven Iterative Retrieval for Long-Term Conversational Memory
\system{} (Ours) beats MIRIX · Judge Acc [Overall]
76.5 vs 75.9
- EviMem: Evidence-Gap-Driven Iterative Retrieval for Long-Term Conversational Memory
\system{} (Ours) beats MIRIX · F1 [Overall]
0.177 vs 0.113
- EviMem: Evidence-Gap-Driven Iterative Retrieval for Long-Term Conversational Memory
\system{} (Ours) beats MIRIX · ROUGE-L [Overall]
0.170 vs 0.108
- EviMem: Evidence-Gap-Driven Iterative Retrieval for Long-Term Conversational Memory
\system{} (Ours) beats MIRIX · BERTScore [Overall]
0.852 vs 0.840
- EviMem: Evidence-Gap-Driven Iterative Retrieval for Long-Term Conversational Memory
\system{} (Ours) beats MIRIX · G-EVAL [Multi-hop]
2.89 vs 2.46
- EviMem: Evidence-Gap-Driven Iterative Retrieval for Long-Term Conversational Memory
\system{} (Ours) beats MIRIX · Judge Acc [Multi-hop]
85.2 vs 65.9
- EviMem: Evidence-Gap-Driven Iterative Retrieval for Long-Term Conversational Memory
\system{} (Ours) beats MIRIX · F1 [Multi-hop]
0.260 vs 0.099
- EviMem: Evidence-Gap-Driven Iterative Retrieval for Long-Term Conversational Memory
\system{} (Ours) beats MIRIX · ROUGE-L [Multi-hop]
0.239 vs 0.091
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.