Method Drift›Retrieval-augmented generation
RAFT
RAFT: A Real-World Few-Shot Text Classification BenchmarkRetrieval-augmented generation · first seen Sep 28, 2021
superseded — cited as a baseline and beaten by newer methods
3 papers critique it · 4 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites RAFT as a baseline.
“However, it suffers from conditional memorization bias and canonical answer overfitting.”
— Systematic Knowledge Injection into Large Language Models via Diverse Augmentation for Domain-Specific RAG“However, RAFT-trained models exhibit a critical limitation: they are conditioned to answer queries even when provided with entirely noisy contexts.”
— Divide-Then-Align: Honest Alignment based on the Knowledge Boundary of RAG“RAFT solely focuses on identifying helpful information from retrieved documents. It learns to mimic the structured output format of teacher models that extract and directly quote sentences, rather than fostering domain thinking—unleashing reasoning capabilities involving higher-order cognitive processes.”
— RARE: Retrieval-Augmented Reasoning Modeling
Beaten on benchmarks
Head-to-head results where a newer method reports beating RAFT. Values are copied from the source paper's tables — verify against the cited paper.
- Systematic Knowledge Injection into Large Language Models via Diverse Augmentation for Domain-Specific RAG
PA-RAG beats RAFT · Recall [Book 1 Overall]
77.1 vs 70.3
- Systematic Knowledge Injection into Large Language Models via Diverse Augmentation for Domain-Specific RAG
PA-RAG beats RAFT · Mixtral-J [Book 1 Overall]
92.0 vs 79.3
- Systematic Knowledge Injection into Large Language Models via Diverse Augmentation for Domain-Specific RAG
PA-RAG beats RAFT · Recall [Book 2 Overall]
74.7 vs 68.7
- Systematic Knowledge Injection into Large Language Models via Diverse Augmentation for Domain-Specific RAG
PA-RAG beats RAFT · Mixtral-J [Book 2 Overall]
83.6 vs 71.4
- Feedback Adaptation for Retrieval-Augmented Generation
PatchRAG beats RAFT · NQ (Exact Match) [t (post-feedback)]
49.8 vs 41.9
- Feedback Adaptation for Retrieval-Augmented Generation
PatchRAG beats RAFT · TriviaQA (Exact Match) [t (post-feedback)]
83.9 vs 80.5
- Feedback Adaptation for Retrieval-Augmented Generation
PatchRAG beats RAFT · HotpotQA (F1) [t (post-feedback)]
53.2 vs 49.4
- Feedback Adaptation for Retrieval-Augmented Generation
PatchRAG beats RAFT · Average [t (post-feedback)]
62.3 vs 57.3
- DMA: Online RAG Alignment with Human Feedback
DMA beats RAFT · Hit@1 [TriviaQA (Conversational QA)]
68.81 vs 60.10
- DMA: Online RAG Alignment with Human Feedback
DMA beats RAFT · F1 [TriviaQA (Conversational QA)]
68.90 vs 57.40
- DMA: Online RAG Alignment with Human Feedback
DMA beats RAFT · Hit@1 [HotpotQA (Conversational QA)]
33.92 vs 30.20
- DMA: Online RAG Alignment with Human Feedback
DMA beats RAFT · F1 [HotpotQA (Conversational QA)]
41.88 vs 35.80
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- Stable-RAGStable-RAG: Mitigating Retrieval-Permutation-Induced Hallucinations in Retrieval-Augmented GenerationApr 21, 2026
- Apr 2, 2026
- Feb 24, 2026
- Jan 16, 2026
- Nov 6, 2025