CLLGOct 24, 2022

We need to talk about random seeds

arXiv:2210.13393v118 citationsh-index: 46
Originality Synthesis-oriented
AI Analysis

This highlights a methodological flaw in machine learning research that could undermine result reliability, particularly for researchers and practitioners in NLP and related fields.

The paper critiques the misuse of random seeds in neural network research, identifying risky practices like using fixed seeds for replicability and varying only seeds for performance comparisons, and reports that over 50% of 85 recent ACL Anthology publications exhibit such issues.

Modern neural network libraries all take as a hyperparameter a random seed, typically used to determine the initial state of the model parameters. This opinion piece argues that there are some safe uses for random seeds: as part of the hyperparameter search to select a good model, creating an ensemble of several models, or measuring the sensitivity of the training algorithm to the random seed hyperparameter. It argues that some uses for random seeds are risky: using a fixed random seed for "replicability" and varying only the random seed to create score distributions for performance comparison. An analysis of 85 recent publications from the ACL Anthology finds that more than 50% contain risky uses of random seeds.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes