CLMay 23, 2023

Exploring Contrast Consistency of Open-Domain Question Answering Systems on Minimally Edited Questions

arXiv:2305.14441v1133 citations
Originality Incremental advance
AI Analysis

This work addresses the robustness of open-domain question answering systems for users relying on consistent predictions under perturbations, representing an incremental improvement in evaluation and training methods.

The paper tackles the problem of contrast consistency in open-domain question answering by collecting minimally edited questions as contrast sets, revealing that a widely used dense passage retriever performs poorly on these sets despite strong standard test performance, and introduces a query-side contrastive loss that improves consistency without sacrificing accuracy.

Contrast consistency, the ability of a model to make consistently correct predictions in the presence of perturbations, is an essential aspect in NLP. While studied in tasks such as sentiment analysis and reading comprehension, it remains unexplored in open-domain question answering (OpenQA) due to the difficulty of collecting perturbed questions that satisfy factuality requirements. In this work, we collect minimally edited questions as challenging contrast sets to evaluate OpenQA models. Our collection approach combines both human annotation and large language model generation. We find that the widely used dense passage retriever (DPR) performs poorly on our contrast sets, despite fitting the training set well and performing competitively on standard test sets. To address this issue, we introduce a simple and effective query-side contrastive loss with the aid of data augmentation to improve DPR training. Our experiments on the contrast sets demonstrate that DPR's contrast consistency is improved without sacrificing its accuracy on the standard test sets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes