CLFeb 20, 2023

A Two-Sided Discussion of Preregistration of NLP Research

Anders Søgaard, Daniel Hershcovich, Miryam de Lhoneux

arXiv:2302.10086v128.2267 citationsh-index: 46

Originality Synthesis-oriented

AI Analysis

This is an incremental discussion of methodological practices for NLP researchers, highlighting both benefits and risks of preregistration.

The paper discusses the proposal to adopt preregistration in NLP research to address methodological issues like fishing expeditions and lack of negative results, but it critically examines potential drawbacks such as bias toward confirmatory research and increased publication bias.

Van Miltenburg et al. (2021) suggest NLP research should adopt preregistration to prevent fishing expeditions and to promote publication of negative results. At face value, this is a very reasonable suggestion, seemingly solving many methodological problems with NLP research. We discuss pros and cons -- some old, some new: a) Preregistration is challenged by the practice of retrieving hypotheses after the results are known; b) preregistration may bias NLP toward confirmatory research; c) preregistration must allow for reclassification of research as exploratory; d) preregistration may increase publication bias; e) preregistration may increase flag-planting; f) preregistration may increase p-hacking; and finally, g) preregistration may make us less risk tolerant. We cast our discussion as a dialogue, presenting both sides of the debate.

View on arXiv PDF

Similar