A Step Toward Quantifying Independently Reproducible Machine Learning Research
This work addresses the reproducibility crisis in ML research by providing empirical data, though it is incremental as it focuses on manual analysis rather than automated solutions.
The authors tackled the problem of quantifying independent reproducibility in machine learning research by manually implementing 255 papers from 1984 to 2017 and analyzing features statistically, finding that code release alone is insufficient for reproducibility.
What makes a paper independently reproducible? Debates on reproducibility center around intuition or assumptions but lack empirical results. Our field focuses on releasing code, which is important, but is not sufficient for determining reproducibility. We take the first step toward a quantifiable answer by manually attempting to implement 255 papers published from 1984 until 2017, recording features of each paper, and performing statistical analysis of the results. For each paper, we did not look at the authors code, if released, in order to prevent bias toward discrepancies between code and paper.