Guaranteeing Reproducibility in Deep Learning Competitions
This addresses reproducibility issues for researchers and practitioners in machine learning competitions, though it is incremental as it builds on existing competition frameworks.
The paper tackles the problem of irreproducible results in deep learning competitions by proposing a new challenge paradigm where methods are re-trained by organizers in a controlled setting, ensuring reproducibility and generalization to held-out test sets.
To encourage the development of methods with reproducible and robust training behavior, we propose a challenge paradigm where competitors are evaluated directly on the performance of their learning procedures rather than pre-trained agents. Since competition organizers re-train proposed methods in a controlled setting they can guarantee reproducibility, and -- by retraining submissions using a held-out test set -- help ensure generalization past the environments on which they were trained.