DataDeps.jl: Repeatable Data Setup for Replicable Data Science
This tool addresses reproducibility issues for data scientists and researchers by automating data dependencies, though it is incremental as it builds on existing practices for data management.
The authors tackled the problem of manual and error-prone data setup in replicable data science by introducing DataDeps.jl, a Julia package that automates the handling of static datasets, resulting in enhanced reproducibility and easier extension of research software.
We present DataDeps.jl: a julia package for the reproducible handling of static datasets to enhance the repeatability of scripts used in the data and computational sciences. It is used to automate the data setup part of running software which accompanies a paper to replicate a result. This step is commonly done manually, which expends time and allows for confusion. This functionality is also useful for other packages which require data to function (e.g. a trained machine learning based model). DataDeps.jl simplifies extending research software by automatically managing the dependencies and makes it easier to run another author's code, thus enhancing the reproducibility of data science research.