CLAINov 15, 2023

Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models

arXiv:2311.09210v2207 citationsh-index: 23
AI Analysis

This addresses reliability issues in RALMs for open-domain QA, reducing factual errors and improving unknown handling, though it is an incremental advance over existing RALM methods.

The paper tackles the problem of retrieval-augmented language models (RALMs) being misled by irrelevant retrieved documents and failing to handle unknown queries, introducing Chain-of-Noting (CoN) to improve robustness, which achieved an average +7.9 EM score improvement with noisy documents and +10.5 in rejection rates for out-of-scope questions.

Retrieval-augmented language models (RALMs) represent a substantial advancement in the capabilities of large language models, notably in reducing factual hallucination by leveraging external knowledge sources. However, the reliability of the retrieved information is not always guaranteed. The retrieval of irrelevant data can lead to misguided responses, and potentially causing the model to overlook its inherent knowledge, even when it possesses adequate information to address the query. Moreover, standard RALMs often struggle to assess whether they possess adequate knowledge, both intrinsic and retrieved, to provide an accurate answer. In situations where knowledge is lacking, these systems should ideally respond with "unknown" when the answer is unattainable. In response to these challenges, we introduces Chain-of-Noting (CoN), a novel approach aimed at improving the robustness of RALMs in facing noisy, irrelevant documents and in handling unknown scenarios. The core idea of CoN is to generate sequential reading notes for retrieved documents, enabling a thorough evaluation of their relevance to the given question and integrating this information to formulate the final answer. We employed ChatGPT to create training data for CoN, which was subsequently trained on an LLaMa-2 7B model. Our experiments across four open-domain QA benchmarks show that RALMs equipped with CoN significantly outperform standard RALMs. Notably, CoN achieves an average improvement of +7.9 in EM score given entirely noisy retrieved documents and +10.5 in rejection rates for real-time questions that fall outside the pre-training knowledge scope.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes