CLAIApr 9, 2022

Extending the Scope of Out-of-Domain: Examining QA models in multiple subdomains

arXiv:2204.04534v1641 citationsh-index: 24
AI Analysis

This work highlights a critical limitation in QA system generalizability for researchers and practitioners, revealing biases from internal dataset characteristics, though it is incremental as it extends prior out-of-domain studies.

The paper tackles the problem of out-of-domain generalization in QA systems by examining performance across subdomains defined by internal dataset characteristics like question type and text length, finding that performance significantly drops when train and test data come from different subdomains, with reductions up to 15% in F1 score.

Past works that investigate out-of-domain performance of QA systems have mainly focused on general domains (e.g. news domain, wikipedia domain), underestimating the importance of subdomains defined by the internal characteristics of QA datasets. In this paper, we extend the scope of "out-of-domain" by splitting QA examples into different subdomains according to their several internal characteristics including question type, text length, answer position. We then examine the performance of QA systems trained on the data from different subdomains. Experimental results show that the performance of QA systems can be significantly reduced when the train data and test data come from different subdomains. These results question the generalizability of current QA systems in multiple subdomains, suggesting the need to combat the bias introduced by the internal characteristics of QA datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes