SELGSep 9, 2021

The challenge of reproducible ML: an empirical study on the impact of bugs

arXiv:2109.03991v17 citations
Originality Synthesis-oriented
AI Analysis

This addresses the reproducibility crisis for ML researchers by providing tools and methods to study non-determinism, though it is incremental as it builds on existing concerns without broad new solutions.

The paper tackles the problem of reproducibility in machine learning by investigating the impact of bugs in ML libraries on model performance, finding no evidence that bugs in PyTorch affect performance based on their dataset.

Reproducibility is a crucial requirement in scientific research. When results of research studies and scientific papers have been found difficult or impossible to reproduce, we face a challenge which is called reproducibility crisis. Although the demand for reproducibility in Machine Learning (ML) is acknowledged in the literature, a main barrier is inherent non-determinism in ML training and inference. In this paper, we establish the fundamental factors that cause non-determinism in ML systems. A framework, ReproduceML, is then introduced for deterministic evaluation of ML experiments in a real, controlled environment. ReproduceML allows researchers to investigate software configuration effects on ML training and inference. Using ReproduceML, we run a case study: investigation of the impact of bugs inside ML libraries on performance of ML experiments. This study attempts to quantify the impact that the occurrence of bugs in a popular ML framework, PyTorch, has on the performance of trained models. To do so, a comprehensive methodology is proposed to collect buggy versions of ML libraries and run deterministic ML experiments using ReproduceML. Our initial finding is that there is no evidence based on our limited dataset to show that bugs which occurred in PyTorch do affect the performance of trained models. The proposed methodology as well as ReproduceML can be employed for further research on non-determinism and bugs.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes