FVA-RAG: Falsification-Verification Alignment for Mitigating Sycophantic Hallucinations
This addresses a critical vulnerability in RAG systems for users relying on factual generation, though it appears incremental as it builds on existing RAG frameworks with a novel verification approach.
The paper tackles the problem of retrieval sycophancy in RAG systems, where models fetch biased documents that align with user misconceptions, leading to hallucinations with citations; it introduces FVA-RAG, which uses an adversarial retrieval policy to seek contradictory evidence, and preliminary experiments show it significantly improves robustness against such hallucinations compared to standard RAG baselines.
Retrieval-Augmented Generation (RAG) systems have significantly reduced hallucinations in Large Language Models (LLMs) by grounding responses in external context. However, standard RAG architectures suffer from a critical vulnerability: Retrieval Sycophancy. When presented with a query based on a false premise or a common misconception, vector-based retrievers tend to fetch documents that align with the user's bias rather than objective truth, leading the model to "hallucinate with citations." In this work, we introduce Falsification-Verification Alignment RAG (FVA-RAG), a framework that shifts the retrieval paradigm from Inductive Verification (seeking support) to Deductive Falsification (seeking disproof). Unlike existing "Self-Correction" methods that rely on internal consistency, FVA-RAG deploys a distinct Adversarial Retrieval Policy that actively generates "Kill Queries"-targeted search terms designed to surface contradictory evidence. We introduce a dual-verification mechanism that explicitly weighs the draft answer against this "Anti-Context." Preliminary experiments on a dataset of common misconceptions demonstrate that FVA-RAG significantly improves robustness against sycophantic hallucinations compared to standard RAG baselines, effectively acting as an inference-time "Red Team" for factual generation.