FLARE (Retrieval-augmented generation): superseded — cited as a baseline and beaten by newer methods. 11 paper(s) critique it, 14 beat it on benchmarks — #6 of 1179 most-superseded. Sub-problem: cluster led by Self-RAG. Newer alternatives in the same sub-problem include FAB-Bench, predictive prefetching framework, ConflictRAG, SEMA-RAG, PyRAG.

Method Drift›Retrieval-augmented generation

Superseded baseline#6 of 1,179 most-superseded

FLARE

Active Retrieval Augmented Generation

Retrieval-augmented generation · first seen May 11, 2023

superseded — cited as a baseline and beaten by newer methods

11 papers critique it · 14 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites FLARE as a baseline.

“Adaptive methods such as FLARE~jiang2023active, Self-RAG~asai2024selfrag, and DRAGIN~su2024dragin dynamically trigger retrieval based on uncertainty signals, but do so reactively, first detecting uncertainty and then blocking generation to perform retrieval.”
— Predictive Prefetching for Retrieval-Augmented Generation
“The generate-then-retrieve approach, while effective, is inefficient for queries that definitely need retrieval, as it introduces an extra generation step.”
— Think-then-Act: A Dual-Angle Evaluated Retrieval-Augmented Generation
“these methods overlook the potential noise introduced when handling multiple entities, which can degrade output quality”
— MES-RAG: Bringing Multi-modal, Entity-Storage, and Secure Enhancements to RAG
“However, they both rely on numerous iterative retrieval and pseudo-generations, leading to significant computational costs as well.”
— Rationale-Guided Retrieval Augmented Generation for Medical Question Answering
“FLARE initiates retrieval when any token in a generated sentence has a probability below a certain threshold”
— Probing-RAG: Self-Probing to Guide Language Models in Selective Document Retrieval
“While effective, these approaches often add supervision, special control tokens, auxiliary probers, or multi-stage loops that increase engineering complexity and latency.”
— TARG: Training-Free Adaptive Retrieval Gating for Efficient RAG
“Although this method more precisely identifies the LLM's information needs, its efficacy heavily depends on meticulously crafted few-shot prompts brown2020languagemodelsfewshotlearners and requires continuous retrieval and refinement, leading to substantial manual effort and increased inference costs.”
— Auto-RAG: Autonomous Retrieval-Augmented Generation for Large Language Models
“Although relying on low confidence token for retrieve seems intuitive, we argue that this often results in delayed retrieval, failing to intervene at the optimal moment.”
— Modeling Uncertainty Trends for Timely Retrieval in Dynamic RAG
“However, focusing on local tokens may neglect global reasoning needs.”
— Removal of Hallucination on Hallucination: Debate-Augmented RAG
“However, current dynamic RAG methods fail to predict whether the LLM has the capability to answer a question prior to generation, thereby triggering retrieval in advance. Moreover, most methods often rely on static rules, leading to ineffective timing for retrieval triggers during the generation process.”
— DioR: Adaptive Cognitive Detection and Contextual Retrieval Optimization for Dynamic Retrieval-Augmented Generation
“However, the former approach has limitations as LLMs tend to be overconfident, generating high-confidence probability distributions even when lacking relevant knowledge.”
— ICA-RAG: Information Completeness Guided Adaptive Retrieval-Augmented Generation for Disease Diagnosis

Beaten on benchmarks

Head-to-head results where a newer method reports beating FLARE. Values are copied from the source paper's tables — verify against the cited paper.

DeepNote beats FLARE · f1 [Adaptive RAG baseline FLARE]
51.1 vs 47.8
DeepNote: Note-Centric Deep Retrieval-Augmented Generation
predictive prefetching framework beats FLARE · F1 [HotpotQA]
75.1 vs 72.1
Predictive Prefetching for Retrieval-Augmented Generation
predictive prefetching framework beats FLARE · E2E [HotpotQA]
5.2 vs 6.8
Predictive Prefetching for Retrieval-Augmented Generation
UAR beats FLARE · Overall [7B Models]
85.32 vs 56.50
Unified Active Retrieval for Retrieval Augmented Generation
UAR beats FLARE · Overall [13B Models]
86.33 vs 57.21
Unified Active Retrieval for Retrieval Augmented Generation
SynCheck_MLP beats FLARE · AUROC [Llama 2 7B Chat]
0.831 vs 0.618
Synchronous Faithfulness Monitoring for Trustworthy Retrieval-Augmented Generation
SynCheck_MLP beats FLARE · AUROC [Mistral 7B Instruct]
0.867 vs 0.622
Synchronous Faithfulness Monitoring for Trustworthy Retrieval-Augmented Generation
Probing-RAG beats FLARE · ACC [Gemma-2b in-domain HotpotQA]
39.4 vs 21.0
Probing-RAG: Self-Probing to Guide Language Models in Selective Document Retrieval
Probing-RAG beats FLARE · ACC [Gemma-2b in-domain NQ]
35.0 vs 21.8
Probing-RAG: Self-Probing to Guide Language Models in Selective Document Retrieval
Probing-RAG beats FLARE · ACC [Gemma-2b in-domain TriviaQA]
52.2 vs 31.0
Probing-RAG: Self-Probing to Guide Language Models in Selective Document Retrieval
Probing-RAG beats FLARE · ACC [Gemma-2b out-of-domain MuSiQue]
8.8 vs 5.0
Probing-RAG: Self-Probing to Guide Language Models in Selective Document Retrieval
Probing-RAG beats FLARE · ACC [Gemma-2b out-of-domain 2Wiki]
43.6 vs 27.8
Probing-RAG: Self-Probing to Guide Language Models in Selective Document Retrieval

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.