CL LGOct 22, 2022

Exploring The Landscape of Distributional Robustness for Question Answering Models

Anas Awadalla, Mitchell Wortsman, Gabriel Ilharco, Sewon Min, Ian Magnusson, Hannaneh Hajishirzi, Ludwig Schmidt

AI2UW

arXiv:2210.12517v124.7302 citationsh-index: 82

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of understanding and improving robustness to distribution shifts for question answering models, providing insights for researchers in natural language processing, though it is incremental as it focuses on empirical analysis rather than proposing new methods.

The paper conducted a large empirical evaluation of distributional robustness in question answering models across 350 models and 16 datasets, finding that in-distribution performance often determines out-of-distribution performance, with zero-shot and in-context learning methods being more robust than fully fine-tuned models.

We conduct a large empirical evaluation to investigate the landscape of distributional robustness in question answering. Our investigation spans over 350 models and 16 question answering datasets, including a diverse set of architectures, model sizes, and adaptation methods (e.g., fine-tuning, adapter tuning, in-context learning, etc.). We find that, in many cases, model variations do not affect robustness and in-distribution performance alone determines out-of-distribution performance. Moreover, our findings indicate that i) zero-shot and in-context learning methods are more robust to distribution shifts than fully fine-tuned models; ii) few-shot prompt fine-tuned models exhibit better robustness than few-shot fine-tuned span prediction models; iii) parameter-efficient and robustness enhancing training methods provide no significant robustness improvements. In addition, we publicly release all evaluations to encourage researchers to further analyze robustness trends for question answering models.

View on arXiv PDF

Similar