CLSep 11, 2021

What's in a Name? Answer Equivalence For Open-Domain Question Answering

arXiv:2109.05289v1665 citations
Originality Incremental advance
AI Analysis

This addresses evaluation inaccuracies for QA systems, but it is incremental as it builds on existing methods for answer expansion.

The paper tackles the problem of flawed evaluation in open-domain question answering due to single gold answers, by mining alias entities from knowledge bases to use as additional equivalent answers. This approach increases exact match scores on Natural Questions, TriviaQA, and SQuAD datasets for evaluation and improves model training on real-world datasets.

A flaw in QA evaluation is that annotations often only provide one gold answer. Thus, model predictions semantically equivalent to the answer but superficially different are considered incorrect. This work explores mining alias entities from knowledge bases and using them as additional gold answers (i.e., equivalent answers). We incorporate answers for two settings: evaluation with additional answers and model training with equivalent answers. We analyse three QA benchmarks: Natural Questions, TriviaQA, and SQuAD. Answer expansion increases the exact match score on all datasets for evaluation, while incorporating it helps model training over real-world datasets. We ensure the additional answers are valid through a human post hoc evaluation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes