Method Drift›Retrieval-augmented generation
CRAG
Corrective Retrieval Augmented GenerationRetrieval-augmented generation · first seen Jan 29, 2024
superseded — cited as a baseline and beaten by newer methods
11 papers critique it · 11 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites CRAG as a baseline.
“existing approaches---including Self-RAG~asai2024selfrag and CRAG~yan2024crag---primarily target retrieval relevance without explicitly detecting or resolving contradictions”
— ConflictRAG: Detecting and Resolving Knowledge Conflicts in Retrieval Augmented Generation“However, these methods still operate at the document level, failing to adequately filter individual text chunks.”
— ChunkRAG: Novel LLM-Chunk Filtering Method for RAG Systems“it lacks the capability for high-level reasoning”
— Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting“CRAG provides a comprehensive benchmark with multi-hop and aggregation questions requiring cross-document synthesis, but operates on a fixed dataset without vertical-domain customization”
— FAB-Bench: A Framework for Adaptive RAG Benchmarking in Semiconductor Manufacturing“CRAG~yan2024corrective, on the other hand, leverages the large-scale web search to supplement and rely on the vanilla LLM to integrate and refine knowledge from different sources. However, when the vanilla LLM fails to identify the defects in retrieved results, the whole pipeline would be broken and ineffective.”
— RbFT: Robust Fine-tuning for Retrieval-Augmented Generation against Retrieval Defects“While these methods improve robustness against irrelevant context, they typically operate via Breadth-First Addition: they append new passages to the existing context.”
— Replace, Don't Expand: Mitigating Context Dilution in Multi-Hop RAG via Fixed-Budget Evidence Assembly“CRAG~yan2024corrective uses an iterative approach with a small evaluator model but it still relies on GPT-3.5 for query rewriting.”
— Rationale-Guided Retrieval Augmented Generation for Medical Question Answering“While effective, these approaches often add supervision, special control tokens, auxiliary probers, or multi-stage loops that increase engineering complexity and latency.”
— TARG: Training-Free Adaptive Retrieval Gating for Efficient RAG“While these approaches have improved robustness, leveraging LLMs' in-context learning capabilities in these scenarios is still underexplored.”
— Enhancing Robustness of Retrieval-Augmented Language Models with In-Context Learning“Although CRAG improves the quality of retrieval, it does not address inaccuracies and irrelevancies in the final response.”
— VERA: Validation and Enhancement for Retrieval Augmented systems“the pre-processing methods introduce additional computational costs during inference and may lead to the loss of essential information.”
— R^2AG: Incorporating Retrieval Information into Retrieval Augmented Generation
Beaten on benchmarks
Head-to-head results where a newer method reports beating CRAG. Values are copied from the source paper's tables — verify against the cited paper.
- ConflictRAG: Detecting and Resolving Knowledge Conflicts in Retrieval Augmented Generation
ConflictRAG beats CRAG · Correctness (%) [ConflictQA]
68.9 vs 54.1
- ConflictRAG: Detecting and Resolving Knowledge Conflicts in Retrieval Augmented Generation
ConflictRAG beats CRAG · Correctness (%) [NQ-Conflict]
71.4 vs 57.2
- ConflictRAG: Detecting and Resolving Knowledge Conflicts in Retrieval Augmented Generation
ConflictRAG beats CRAG · Correctness (%) [AmbigQA]
65.8 vs 51.2
- DRAG: Distilling RAG for SLMs from LLMs to Transfer Knowledge and Mitigate Hallucination via Evidence and Graph-based Distillation
DRAG beats CRAG · ARC-C [LLaMA-2-7B backbone]
86.2 vs 68.6
- ChunkRAG: Novel LLM-Chunk Filtering Method for RAG Systems
ChunkRAG beats CRAG · Accuracy [(C) Advanced RAG (SelfRAG-LLaMA2-7b)]
64.9 vs 59.8
- ChunkRAG: Novel LLM-Chunk Filtering Method for RAG Systems
ChunkRAG beats CRAG · FactScore [(C) Advanced RAG (SelfRAG-LLaMA2-7b)]
86.4 vs 74.1
- RbFT: Robust Fine-tuning for Retrieval-Augmented Generation against Retrieval Defects
RbFT beats CRAG · EM [Llama, Clean (τ=0)]
48.4 vs 39.4
- RbFT: Robust Fine-tuning for Retrieval-Augmented Generation against Retrieval Defects
RbFT beats CRAG · EM [Llama, Normal (τ=0.4) - Noisy]
44.5 vs 35.0
- RbFT: Robust Fine-tuning for Retrieval-Augmented Generation against Retrieval Defects
RbFT beats CRAG · EM [Llama, Hard (τ=1.0) - Counterfactual]
33.8 vs 13.5
- RbFT: Robust Fine-tuning for Retrieval-Augmented Generation against Retrieval Defects
RbFT beats CRAG · EM [Qwen, Clean (τ=0)]
45.4 vs 37.0
- RbFT: Robust Fine-tuning for Retrieval-Augmented Generation against Retrieval Defects
RbFT beats CRAG · EM [Qwen, Hard (τ=1.0) - Counterfactual]
25.1 vs 11.4
- A Theory for Token-Level Harmonization in Retrieval-Augmented Generation
Tok-RAG beats CRAG · Accuracy [0% hard negative passages (clean retrieval)]
85.7 vs 82.2
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- May 26, 2026
- May 18, 2026
- ConflictRAGConflictRAG: Detecting and Resolving Knowledge Conflicts in Retrieval Augmented GenerationMay 17, 2026
- SEMA-RAGSEMA-RAG: A Self-Evolving Multi-Agent Retrieval-Augmented Generation Framework for Medical ReasoningMay 16, 2026
- PyRAGRetrieval is Cheap, Show Me the Code: Executable Multi-Hop Reasoning for Retrieval-Augmented GenerationMay 13, 2026
- CoRM-RAGBeyond Semantic Relevance: Counterfactual Risk Minimization for Robust Retrieval-Augmented GenerationMay 2, 2026
- STEMSTEM: Structure-Tracing Evidence Mining for Knowledge Graphs-Driven Retrieval-Augmented GenerationApr 24, 2026
- Apr 22, 2026
- Self-Correcting RAGSelf-Correcting RAG: Enhancing Faithfulness via MMKP Context Selection and NLI-Guided MCTSApr 12, 2026
- Mar 7, 2026
- Cooperative Retrieval-Augmented Generation (CoRAG)Rethinking Retrieval-Augmented Generation as a Cooperative Decision-Making ProblemFeb 21, 2026
- Jan 29, 2026