BM25 (Retrieval-augmented generation): superseded — cited as a baseline and beaten by newer methods. 6 paper(s) critique it, 16 beat it on benchmarks — #10 of 1179 most-superseded. Sub-problem: cluster led by BM25. Newer alternatives in the same sub-problem include Beyond Topical Similarity, Experience-RAG Skill, LFRAG, Don't Retrieve, Navigate, MIGRASCOPE.

Is BM25 superseded? Critiques, benchmarks & alternatives

What papers say

Verbatim critique sentences, each from a paper that cites BM25 as a baseline.

“This paradigm inevitably discards critical visual and structural information, including the row-column relationships of tables, data trends of charts, and layout logic between text and figures.”
— LFRAG: Layout-oriented Fine-grained Retrieval-Augmented Generation on Multimodal Document Understanding
“The semantic and lexical gaps limit the effectiveness of sparse models like BM25 and TF-IDF, which operate on keyword matching”
— ModernBERT + ColBERT: Enhancing biomedical RAG through an advanced re-ranking retriever
“Despite their widespread use, many RAG systems rely on static, off-the-shelf retrieval modules — e.g., BM25 ... that are minimally adapted to the downstream task or domain.”
— Test-time Corpus Feedback: From Retrieval to RAG
“We prioritized a dense retrieval approach over sparse methods (such as BM25) because layperson queries often lack the precise legal terminology found in regulatory documents.”
— Benchmarking Large Language Models for Quebec Insurance: From Closed-Book to Retrieval-Augmented Generation
“They offer computational efficiency but lack deeper semantic comprehension.”
— MegaRAG: Multimodal Knowledge Graph-Based Retrieval Augmented Generation
“However, many practical systems still assume that one fixed retrieval strategy is sufficient across all tasks. This assumption is problematic in realistic agent settings.”
— An Agent-Oriented Pluggable Experience-RAG Skill for Experience-Driven Retrieval Strategy Orchestration

Beaten on benchmarks

Head-to-head results where a newer method reports beating BM25. Values are copied from the source paper's tables — verify against the cited paper.

GraphRAFT beats BM25 · Hit@1 [STARK-PRIME]
63.71 vs 12.75
GraphRAFT: Retrieval Augmented Fine-Tuning for Knowledge Graphs on Graph Databases
GraphRAFT beats BM25 · Hit@5 [STARK-PRIME]
75.39 vs 27.92
GraphRAFT: Retrieval Augmented Fine-Tuning for Knowledge Graphs on Graph Databases
GraphRAFT beats BM25 · R@20 [STARK-PRIME]
76.39 vs 31.25
GraphRAFT: Retrieval Augmented Fine-Tuning for Knowledge Graphs on Graph Databases
GraphRAFT beats BM25 · MRR [STARK-PRIME]
68.99 vs 19.84
GraphRAFT: Retrieval Augmented Fine-Tuning for Knowledge Graphs on Graph Databases
GraphRAFT beats BM25 · Hit@1 [STARK-MAG]
69.64 vs 25.85
GraphRAFT: Retrieval Augmented Fine-Tuning for Knowledge Graphs on Graph Databases
GraphRAFT beats BM25 · Hit@5 [STARK-MAG]
84.32 vs 45.25
GraphRAFT: Retrieval Augmented Fine-Tuning for Knowledge Graphs on Graph Databases
GraphRAFT beats BM25 · R@20 [STARK-MAG]
89.12 vs 45.69
GraphRAFT: Retrieval Augmented Fine-Tuning for Knowledge Graphs on Graph Databases
GraphRAFT beats BM25 · MRR [STARK-MAG]
76.24 vs 34.91
GraphRAFT: Retrieval Augmented Fine-Tuning for Knowledge Graphs on Graph Databases
QMKGF beats BM25 · R-1 [HotpotQA]
64.98 vs 54.86
A Query-Aware Multi-Path Knowledge Graph Fusion Approach for Enhancing Retrieval-Augmented Generation in Large Language Models
QMKGF beats BM25 · R-1 [MuSiQue]
47.42 vs 38.53
A Query-Aware Multi-Path Knowledge Graph Fusion Approach for Enhancing Retrieval-Augmented Generation in Large Language Models
PURPLE beats BM25 · ROUGE-1 [Phi-4-Mini-Instruct (3.84B)]
26.2 vs 25.2
Optimizing User Profiles via Contextual Bandits for Retrieval-Augmented LLM Personalization
Corpus2Skill beats BM25 · F1 [WixQA benchmark]
0.456 vs 0.345
Don't Retrieve, Navigate: Distilling Enterprise Knowledge into Navigable Agent Skills for QA and RAG

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.