CL AIFeb 28, 2025

Aligning Extraction and Generation for Robust Retrieval-Augmented Generation

Hwanjun Song, Jeonghwan Choi, Minseok Kim

arXiv:2503.04789v34.92 citationsh-index: 2WSDM

Originality Incremental advance

AI Analysis

This addresses the issue of unreliable generation in RAG systems for users relying on accurate AI responses, though it is incremental as it builds on existing RAG methods.

The paper tackles the problem of retrieval-augmented generation (RAG) being vulnerable to noise and irrelevant content, which causes hallucinations, by introducing Ext2Gen, a framework that jointly selects evidence and generates answers, resulting in substantial robustness gains and outperforming methods like Recomp and CompAct.

Retrieval-augmented generation (RAG) enhances LLMs with external knowledge, yet generation remains vulnerable to retrieval-induced noise and uncertain placement of relevant chunks, often causing hallucinations. We present Ext2Gen, an extract-then-generate framework that strengthens LLMs via joint evidence selection and answer generation, dynamically identifying query-relevant content while suppressing noise, thereby removing the need for any independent pre-generation compression module. Optimized through preference alignment with well-curated pairwise feedback, Ext2Gen produces accurate and faithful answers even under noisy or imprecise retrieval. Experiments demonstrate that it substantially enhances the robustness of the generation backbone and yields greater performance gains than methods relying on independent compression models, e.g., Recomp, CompAct, EXIT). It further benefits from improved retrieval techniques such as query rewriting, underscoring that generation-side enhancements address limitations that retrieval alone cannot overcome.

View on arXiv PDF

Similar