SEMar 4, 2021

Restoring Execution Environments of Jupyter Notebooks

arXiv:2103.02959v152 citations
Originality Incremental advance
AI Analysis

This addresses reproducibility issues in scientific computing for researchers and data scientists, though it is incremental as it builds on existing dependency analysis methods.

The paper tackles the problem of Jupyter notebooks lacking dependency information, which hinders reproducibility, by presenting SnifferDog, an approach that restores execution environments, making the largest majority of notebooks immediately executable.

More than ninety percent of published Jupyter notebooks do not state dependencies on external packages. This makes them non-executable and thus hinders reproducibility of scientific results. We present SnifferDog, an approach that 1) collects the APIs of Python packages and versions, creating a database of APIs; 2) analyzes notebooks to determine candidates for required packages and versions; and 3) checks which packages are required to make the notebook executable (and ideally, reproduce its stored results). In its evaluation, we show that SnifferDog precisely restores execution environments for the largest majority of notebooks, making them immediately executable for end users.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes