SEAug 3, 2018

DataDeps.jl: Repeatable Data Setup for Replicable Data Science

arXiv:1808.01091v11 citations
Originality Synthesis-oriented
AI Analysis

This tool addresses reproducibility issues for data scientists and researchers by automating data dependencies, though it is incremental as it builds on existing practices for data management.

The authors tackled the problem of manual and error-prone data setup in replicable data science by introducing DataDeps.jl, a Julia package that automates the handling of static datasets, resulting in enhanced reproducibility and easier extension of research software.

We present DataDeps.jl: a julia package for the reproducible handling of static datasets to enhance the repeatability of scripts used in the data and computational sciences. It is used to automate the data setup part of running software which accompanies a paper to replicate a result. This step is commonly done manually, which expends time and allows for confusion. This functionality is also useful for other packages which require data to function (e.g. a trained machine learning based model). DataDeps.jl simplifies extending research software by automatically managing the dependencies and makes it easier to run another author's code, thus enhancing the reproducibility of data science research.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes