CLIRMay 11, 2025

The Distracting Effect: Understanding Irrelevant Passages in RAG

arXiv:2505.06914v133 citationsh-index: 4ACL
Originality Incremental advance
AI Analysis

This addresses a core issue in RAG for improving accuracy in question-answering systems, though it is incremental in refining existing methods.

The paper tackles the problem of irrelevant passages distracting LLMs in RAG systems, resulting in a 7.5% increase in answering accuracy by fine-tuning LLMs with hard distracting passages.

A well-known issue with Retrieval Augmented Generation (RAG) is that retrieved passages that are irrelevant to the query sometimes distract the answer-generating LLM, causing it to provide an incorrect response. In this paper, we shed light on this core issue and formulate the distracting effect of a passage w.r.t. a query (and an LLM). We provide a quantifiable measure of the distracting effect of a passage and demonstrate its robustness across LLMs. Our research introduces novel methods for identifying and using hard distracting passages to improve RAG systems. By fine-tuning LLMs with these carefully selected distracting passages, we achieve up to a 7.5% increase in answering accuracy compared to counterparts fine-tuned on conventional RAG datasets. Our contribution is two-fold: first, we move beyond the simple binary classification of irrelevant passages as either completely unrelated vs. distracting, and second, we develop and analyze multiple methods for finding hard distracting passages. To our knowledge, no other research has provided such a comprehensive framework for identifying and utilizing hard distracting passages.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes