CRAG (Retrieval-augmented generation): superseded — cited as a baseline and beaten by newer methods. 11 paper(s) critique it, 11 beat it on benchmarks — #11 of 1179 most-superseded. Sub-problem: cluster led by Self-RAG. Newer alternatives in the same sub-problem include FAB-Bench, predictive prefetching framework, ConflictRAG, SEMA-RAG, PyRAG.

Method Drift›Retrieval-augmented generation

Superseded baseline#11 of 1,179 most-superseded

CRAG

Corrective Retrieval Augmented Generation

Retrieval-augmented generation · first seen Jan 29, 2024

superseded — cited as a baseline and beaten by newer methods

11 papers critique it · 11 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites CRAG as a baseline.

“existing approaches---including Self-RAG~asai2024selfrag and CRAG~yan2024crag---primarily target retrieval relevance without explicitly detecting or resolving contradictions”
— ConflictRAG: Detecting and Resolving Knowledge Conflicts in Retrieval Augmented Generation
“However, these methods still operate at the document level, failing to adequately filter individual text chunks.”
— ChunkRAG: Novel LLM-Chunk Filtering Method for RAG Systems
“it lacks the capability for high-level reasoning”
— Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting
“CRAG provides a comprehensive benchmark with multi-hop and aggregation questions requiring cross-document synthesis, but operates on a fixed dataset without vertical-domain customization”
— FAB-Bench: A Framework for Adaptive RAG Benchmarking in Semiconductor Manufacturing
“CRAG~yan2024corrective, on the other hand, leverages the large-scale web search to supplement and rely on the vanilla LLM to integrate and refine knowledge from different sources. However, when the vanilla LLM fails to identify the defects in retrieved results, the whole pipeline would be broken and ineffective.”
— RbFT: Robust Fine-tuning for Retrieval-Augmented Generation against Retrieval Defects
“While these methods improve robustness against irrelevant context, they typically operate via Breadth-First Addition: they append new passages to the existing context.”
— Replace, Don't Expand: Mitigating Context Dilution in Multi-Hop RAG via Fixed-Budget Evidence Assembly
“CRAG~yan2024corrective uses an iterative approach with a small evaluator model but it still relies on GPT-3.5 for query rewriting.”
— Rationale-Guided Retrieval Augmented Generation for Medical Question Answering
“While effective, these approaches often add supervision, special control tokens, auxiliary probers, or multi-stage loops that increase engineering complexity and latency.”
— TARG: Training-Free Adaptive Retrieval Gating for Efficient RAG
“While these approaches have improved robustness, leveraging LLMs' in-context learning capabilities in these scenarios is still underexplored.”
— Enhancing Robustness of Retrieval-Augmented Language Models with In-Context Learning
“Although CRAG improves the quality of retrieval, it does not address inaccuracies and irrelevancies in the final response.”
— VERA: Validation and Enhancement for Retrieval Augmented systems
“the pre-processing methods introduce additional computational costs during inference and may lead to the loss of essential information.”
— R^2AG: Incorporating Retrieval Information into Retrieval Augmented Generation

Beaten on benchmarks

Head-to-head results where a newer method reports beating CRAG. Values are copied from the source paper's tables — verify against the cited paper.

ConflictRAG beats CRAG · Correctness (%) [ConflictQA]
68.9 vs 54.1
ConflictRAG: Detecting and Resolving Knowledge Conflicts in Retrieval Augmented Generation
ConflictRAG beats CRAG · Correctness (%) [NQ-Conflict]
71.4 vs 57.2
ConflictRAG: Detecting and Resolving Knowledge Conflicts in Retrieval Augmented Generation
ConflictRAG beats CRAG · Correctness (%) [AmbigQA]
65.8 vs 51.2
ConflictRAG: Detecting and Resolving Knowledge Conflicts in Retrieval Augmented Generation
DRAG beats CRAG · ARC-C [LLaMA-2-7B backbone]
86.2 vs 68.6
DRAG: Distilling RAG for SLMs from LLMs to Transfer Knowledge and Mitigate Hallucination via Evidence and Graph-based Distillation
ChunkRAG beats CRAG · Accuracy [(C) Advanced RAG (SelfRAG-LLaMA2-7b)]
64.9 vs 59.8
ChunkRAG: Novel LLM-Chunk Filtering Method for RAG Systems
ChunkRAG beats CRAG · FactScore [(C) Advanced RAG (SelfRAG-LLaMA2-7b)]
86.4 vs 74.1
ChunkRAG: Novel LLM-Chunk Filtering Method for RAG Systems
RbFT beats CRAG · EM [Llama, Clean (τ=0)]
48.4 vs 39.4
RbFT: Robust Fine-tuning for Retrieval-Augmented Generation against Retrieval Defects
RbFT beats CRAG · EM [Llama, Normal (τ=0.4) - Noisy]
44.5 vs 35.0
RbFT: Robust Fine-tuning for Retrieval-Augmented Generation against Retrieval Defects
RbFT beats CRAG · EM [Llama, Hard (τ=1.0) - Counterfactual]
33.8 vs 13.5
RbFT: Robust Fine-tuning for Retrieval-Augmented Generation against Retrieval Defects
RbFT beats CRAG · EM [Qwen, Clean (τ=0)]
45.4 vs 37.0
RbFT: Robust Fine-tuning for Retrieval-Augmented Generation against Retrieval Defects
RbFT beats CRAG · EM [Qwen, Hard (τ=1.0) - Counterfactual]
25.1 vs 11.4
RbFT: Robust Fine-tuning for Retrieval-Augmented Generation against Retrieval Defects
Tok-RAG beats CRAG · Accuracy [0% hard negative passages (clean retrieval)]
85.7 vs 82.2
A Theory for Token-Level Harmonization in Retrieval-Augmented Generation

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.