RAFT (Retrieval-augmented generation): superseded — cited as a baseline and beaten by newer methods. 3 paper(s) critique it, 4 beat it on benchmarks — #31 of 1179 most-superseded. Sub-problem: cluster led by RetRobust. Newer alternatives in the same sub-problem include Stable-RAG, Neuro-RIT, ERA, TTARAG, DMA (Dynamic Memory Alignment).

Method Drift›Retrieval-augmented generation

Superseded baseline#31 of 1,179 most-superseded

RAFT

RAFT: A Real-World Few-Shot Text Classification Benchmark

Retrieval-augmented generation · first seen Sep 28, 2021

superseded — cited as a baseline and beaten by newer methods

3 papers critique it · 4 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites RAFT as a baseline.

“However, it suffers from conditional memorization bias and canonical answer overfitting.”
— Systematic Knowledge Injection into Large Language Models via Diverse Augmentation for Domain-Specific RAG
“However, RAFT-trained models exhibit a critical limitation: they are conditioned to answer queries even when provided with entirely noisy contexts.”
— Divide-Then-Align: Honest Alignment based on the Knowledge Boundary of RAG
“RAFT solely focuses on identifying helpful information from retrieved documents. It learns to mimic the structured output format of teacher models that extract and directly quote sentences, rather than fostering domain thinking—unleashing reasoning capabilities involving higher-order cognitive processes.”
— RARE: Retrieval-Augmented Reasoning Modeling

Beaten on benchmarks

Head-to-head results where a newer method reports beating RAFT. Values are copied from the source paper's tables — verify against the cited paper.

PA-RAG beats RAFT · Recall [Book 1 Overall]
77.1 vs 70.3
Systematic Knowledge Injection into Large Language Models via Diverse Augmentation for Domain-Specific RAG
PA-RAG beats RAFT · Mixtral-J [Book 1 Overall]
92.0 vs 79.3
Systematic Knowledge Injection into Large Language Models via Diverse Augmentation for Domain-Specific RAG
PA-RAG beats RAFT · Recall [Book 2 Overall]
74.7 vs 68.7
Systematic Knowledge Injection into Large Language Models via Diverse Augmentation for Domain-Specific RAG
PA-RAG beats RAFT · Mixtral-J [Book 2 Overall]
83.6 vs 71.4
Systematic Knowledge Injection into Large Language Models via Diverse Augmentation for Domain-Specific RAG
PatchRAG beats RAFT · NQ (Exact Match) [t (post-feedback)]
49.8 vs 41.9
Feedback Adaptation for Retrieval-Augmented Generation
PatchRAG beats RAFT · TriviaQA (Exact Match) [t (post-feedback)]
83.9 vs 80.5
Feedback Adaptation for Retrieval-Augmented Generation
PatchRAG beats RAFT · HotpotQA (F1) [t (post-feedback)]
53.2 vs 49.4
Feedback Adaptation for Retrieval-Augmented Generation
PatchRAG beats RAFT · Average [t (post-feedback)]
62.3 vs 57.3
Feedback Adaptation for Retrieval-Augmented Generation
DMA beats RAFT · Hit@1 [TriviaQA (Conversational QA)]
68.81 vs 60.10
DMA: Online RAG Alignment with Human Feedback
DMA beats RAFT · F1 [TriviaQA (Conversational QA)]
68.90 vs 57.40
DMA: Online RAG Alignment with Human Feedback
DMA beats RAFT · Hit@1 [HotpotQA (Conversational QA)]
33.92 vs 30.20
DMA: Online RAG Alignment with Human Feedback
DMA beats RAFT · F1 [HotpotQA (Conversational QA)]
41.88 vs 35.80
DMA: Online RAG Alignment with Human Feedback

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.