CL AIApr 9, 2022

Extending the Scope of Out-of-Domain: Examining QA models in multiple subdomains

Chenyang Lyu, Jennifer Foster, Yvette Graham

arXiv:2204.04534v131.9641 citationsh-index: 24Has Code

Originality Incremental advance

AI Analysis

This work highlights a critical limitation in QA system generalizability for researchers and practitioners, revealing biases from internal dataset characteristics, though it is incremental as it extends prior out-of-domain studies.

The paper tackles the problem of out-of-domain generalization in QA systems by examining performance across subdomains defined by internal dataset characteristics like question type and text length, finding that performance significantly drops when train and test data come from different subdomains, with reductions up to 15% in F1 score.

Past works that investigate out-of-domain performance of QA systems have mainly focused on general domains (e.g. news domain, wikipedia domain), underestimating the importance of subdomains defined by the internal characteristics of QA datasets. In this paper, we extend the scope of "out-of-domain" by splitting QA examples into different subdomains according to their several internal characteristics including question type, text length, answer position. We then examine the performance of QA systems trained on the data from different subdomains. Experimental results show that the performance of QA systems can be significantly reduced when the train data and test data come from different subdomains. These results question the generalizability of current QA systems in multiple subdomains, suggesting the need to combat the bias introduced by the internal characteristics of QA datasets.

View on arXiv PDF Code

Similar