CLAIFeb 28, 2025

Aligning Extraction and Generation for Robust Retrieval-Augmented Generation

arXiv:2503.04789v32 citationsh-index: 2WSDM
Originality Incremental advance
AI Analysis

This addresses the issue of unreliable generation in RAG systems for users relying on accurate AI responses, though it is incremental as it builds on existing RAG methods.

The paper tackles the problem of retrieval-augmented generation (RAG) being vulnerable to noise and irrelevant content, which causes hallucinations, by introducing Ext2Gen, a framework that jointly selects evidence and generates answers, resulting in substantial robustness gains and outperforming methods like Recomp and CompAct.

Retrieval-augmented generation (RAG) enhances LLMs with external knowledge, yet generation remains vulnerable to retrieval-induced noise and uncertain placement of relevant chunks, often causing hallucinations. We present Ext2Gen, an extract-then-generate framework that strengthens LLMs via joint evidence selection and answer generation, dynamically identifying query-relevant content while suppressing noise, thereby removing the need for any independent pre-generation compression module. Optimized through preference alignment with well-curated pairwise feedback, Ext2Gen produces accurate and faithful answers even under noisy or imprecise retrieval. Experiments demonstrate that it substantially enhances the robustness of the generation backbone and yields greater performance gains than methods relying on independent compression models, e.g., Recomp, CompAct, EXIT). It further benefits from improved retrieval techniques such as query rewriting, underscoring that generation-side enhancements address limitations that retrieval alone cannot overcome.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes