CLAINov 14, 2023

Learning to Filter Context for Retrieval-Augmented Generation

arXiv:2311.08377v177 citationsh-index: 91
Originality Incremental advance
AI Analysis

This addresses reliability in retrieval-augmented generation for tasks like open-domain QA and fact verification, representing an incremental improvement in context filtering.

The paper tackles the problem of retrieval-augmented generation systems generating outputs with issues like hallucinations due to irrelevant retrieved contexts, proposing FILCO to filter context using lexical and information-theoretic methods, and shows it outperforms existing approaches on six knowledge-intensive tasks with models like FLAN-T5 and LLaMa2.

On-the-fly retrieval of relevant knowledge has proven an essential element of reliable systems for tasks such as open-domain question answering and fact verification. However, because retrieval systems are not perfect, generation models are required to generate outputs given partially or entirely irrelevant passages. This can cause over- or under-reliance on context, and result in problems in the generated output such as hallucinations. To alleviate these problems, we propose FILCO, a method that improves the quality of the context provided to the generator by (1) identifying useful context based on lexical and information-theoretic approaches, and (2) training context filtering models that can filter retrieved contexts at test time. We experiment on six knowledge-intensive tasks with FLAN-T5 and LLaMa2, and demonstrate that our method outperforms existing approaches on extractive question answering (QA), complex multi-hop and long-form QA, fact verification, and dialog generation tasks. FILCO effectively improves the quality of context, whether or not it supports the canonical output.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes