GINGER: Grounded Information Nugget-Based Generation of Responses
This addresses the problem of improving factual accuracy and attribution in RAG systems for users relying on AI-generated responses, though it appears incremental as it builds on existing RAG methods with a novel modular approach.
The paper tackles challenges in retrieval-augmented generation (RAG) related to factual correctness and source attribution by proposing GINGER, a modular pipeline that uses information nuggets for grounded response generation, achieving state-of-the-art performance on the TREC RAG'24 dataset.
Retrieval-augmented generation (RAG) faces challenges related to factual correctness, source attribution, and response completeness. To address them, we propose a modular pipeline for grounded response generation that operates on information nuggets-minimal, atomic units of relevant information extracted from retrieved documents. The multistage pipeline encompasses nugget detection, clustering, ranking, top cluster summarization, and fluency enhancement. It guarantees grounding in specific facts, facilitates source attribution, and ensures maximum information inclusion within length constraints. Extensive experiments on the TREC RAG'24 dataset evaluated with the AutoNuggetizer framework demonstrate that GINGER achieves state-of-the-art performance on this benchmark.