Evidentiality-aware Retrieval for Overcoming Abstractiveness in Open-Domain Question Answering
This addresses a key bottleneck in abstractive ODQA for researchers and practitioners by offering a more efficient data-centric approach compared to computationally expensive iterative methods.
The paper tackled the challenge of insufficient training data for evidence passage retrieval in abstractive open-domain question answering by proposing Evidentiality-Aware Dense Passage Retrieval (EADPR), which uses synthetic distractor samples to improve discrimination, achieving strong performance on multiple tasks.
The long-standing goal of dense retrievers in abtractive open-domain question answering (ODQA) tasks is to learn to capture evidence passages among relevant passages for any given query, such that the reader produce factually correct outputs from evidence passages. One of the key challenge is the insufficient amount of training data with the supervision of the answerability of the passages. Recent studies rely on iterative pipelines to annotate answerability using signals from the reader, but their high computational costs hamper practical applications. In this paper, we instead focus on a data-centric approach and propose Evidentiality-Aware Dense Passage Retrieval (EADPR), which leverages synthetic distractor samples to learn to discriminate evidence passages from distractors. We conduct extensive experiments to validate the effectiveness of our proposed method on multiple abstractive ODQA tasks.