CLSep 4, 2019

Towards Realistic Practices In Low-Resource Natural Language Processing: The Development Set

arXiv:1909.01522v21008 citations
AI Analysis

This addresses methodological flaws in low-resource NLP research that can mislead practitioners by over- or underestimating model performance.

The study examined whether using development sets in low-resource NLP leads to biased performance estimates compared to more realistic training without them, finding that while average accuracy differences were small (up to 1.4%), some languages and tasks showed large discrepancies (up to 18.0%).

Development sets are impractical to obtain for real low-resource languages, since using all available data for training is often more effective. However, development sets are widely used in research papers that purport to deal with low-resource natural language processing (NLP). Here, we aim to answer the following questions: Does using a development set for early stopping in the low-resource setting influence results as compared to a more realistic alternative, where the number of training epochs is tuned on development languages? And does it lead to overestimation or underestimation of performance? We repeat multiple experiments from recent work on neural models for low-resource NLP and compare results for models obtained by training with and without development sets. On average over languages, absolute accuracy differs by up to 1.4%. However, for some languages and tasks, differences are as big as 18.0% accuracy. Our results highlight the importance of realistic experimental setups in the publication of low-resource NLP research results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes