CLMar 3, 2024

Answerability in Retrieval-Augmented Open-Domain Question Answering

arXiv:2403.01461v1h-index: 16
Originality Incremental advance
AI Analysis

This addresses a specific issue in ODQA for researchers and practitioners, but it is incremental as it builds on existing datasets and methods.

The paper tackled the problem of irrelevant text excerpts in Open-Domain Question Answering retrieval systems by investigating the limitations of models trained with a randomized strategy, finding a drop in accuracy from 98% to 1%, and proposed an efficient training approach using SQuAD 2.0 data to achieve nearly 100% accuracy.

The performance of Open-Domain Question Answering (ODQA) retrieval systems can exhibit sub-optimal behavior, providing text excerpts with varying degrees of irrelevance. Unfortunately, many existing ODQA datasets lack examples specifically targeting the identification of irrelevant text excerpts. Previous attempts to address this gap have relied on a simplistic approach of pairing questions with random text excerpts. This paper aims to investigate the effectiveness of models trained using this randomized strategy, uncovering an important limitation in their ability to generalize to irrelevant text excerpts with high semantic overlap. As a result, we observed a substantial decrease in predictive accuracy, from 98% to 1%. To address this limitation, we discovered an efficient approach for training models to recognize such excerpts. By leveraging unanswerable pairs from the SQuAD 2.0 dataset, our models achieve a nearly perfect (~100%) accuracy when confronted with these challenging text excerpts.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes