Experimental Standards for Deep Learning in Natural Language Processing Research
This work tackles the problem of inconsistent experimental practices for researchers in NLP, though it is incremental as it synthesizes existing discussions rather than introducing new techniques.
The paper addresses the lack of common experimental standards in deep learning for NLP by distilling discussions into a widely-applicable methodology to strengthen evidence, improve reproducibility, and support scientific progress, with standards collected in a public repository for future adaptation.
The field of Deep Learning (DL) has undergone explosive growth during the last decade, with a substantial impact on Natural Language Processing (NLP) as well. Yet, compared to more established disciplines, a lack of common experimental standards remains an open challenge to the field at large. Starting from fundamental scientific principles, we distill ongoing discussions on experimental standards in NLP into a single, widely-applicable methodology. Following these best practices is crucial to strengthen experimental evidence, improve reproducibility and support scientific progress. These standards are further collected in a public repository to help them transparently adapt to future needs.