A Survey on Out-of-Distribution Evaluation of Neural NLP Models
It addresses the problem of fragmented research for NLP practitioners and researchers, but is incremental as it synthesizes existing work without new empirical results.
This survey tackles the lack of integrated discussion on out-of-distribution evaluation in neural NLP models by comparing adversarial robustness, domain generalization, and dataset biases under a unified definition, summarizing data-generating processes and evaluation protocols, and highlighting future challenges and opportunities.
Adversarial robustness, domain generalization and dataset biases are three active lines of research contributing to out-of-distribution (OOD) evaluation on neural NLP models. However, a comprehensive, integrated discussion of the three research lines is still lacking in the literature. In this survey, we 1) compare the three lines of research under a unifying definition; 2) summarize the data-generating processes and evaluation protocols for each line of research; and 3) emphasize the challenges and opportunities for future work.