RepDL: Bit-level Reproducible Deep Learning Training and Inference
This addresses reproducibility issues for deep learning practitioners, though it is incremental as it builds on existing deterministic configurations.
The paper tackles the problem of non-determinism and non-reproducibility in deep learning, which cause inconsistent results across runs and platforms, by introducing RepDL, an open-source library that ensures deterministic and bitwise-reproducible training and inference.
Non-determinism and non-reproducibility present significant challenges in deep learning, leading to inconsistent results across runs and platforms. These issues stem from two origins: random number generation and floating-point computation. While randomness can be controlled through deterministic configurations, floating-point inconsistencies remain largely unresolved. To address this, we introduce RepDL, an open-source library that ensures deterministic and bitwise-reproducible deep learning training and inference across diverse computing environments. RepDL achieves this by enforcing correct rounding and order invariance in floating-point computation. The source code is available at https://github.com/microsoft/RepDL .