DRAGged into Conflicts: Detecting and Addressing Conflicting Sources in Search-Augmented LLMs
This addresses a critical issue for users of retrieval-augmented generation systems by providing tools to evaluate and improve conflict resolution, though it is incremental in building on existing RAG methods.
The paper tackles the problem of conflicting information in retrieved sources for search-augmented LLMs, introducing a taxonomy and benchmark called CONFLICTS, and finds that LLMs struggle to resolve conflicts, though explicit reasoning prompts improve response quality.
Retrieval Augmented Generation (RAG) is a commonly used approach for enhancing large language models (LLMs) with relevant and up-to-date information. However, the retrieved sources can often contain conflicting information and it remains unclear how models should address such discrepancies. In this work, we first propose a novel taxonomy of knowledge conflict types in RAG, along with the desired model behavior for each type. We then introduce CONFLICTS, a high-quality benchmark with expert annotations of conflict types in a realistic RAG setting. CONFLICTS is the first benchmark that enables tracking progress on how models address a wide range of knowledge conflicts. We conduct extensive experiments on this benchmark, showing that LLMs often struggle to appropriately resolve conflicts between sources. While prompting LLMs to explicitly reason about the potential conflict in the retrieved documents significantly improves the quality and appropriateness of their responses, substantial room for improvement in future research remains.