SEMay 13

ReproScore: Separating Readiness from Outcome in Research Software Reproducibility Assessment

arXiv:2605.1327545.4
Predicted impact top 56% in SE · last 90 daysOriginality Incremental advance
AI Analysis

For digital libraries curating research software, this work provides a scalable framework to assess reproducibility without conflating repository completeness with runnability, addressing a critical gap in automated curation.

ReproScore separates static repository readiness from actual execution outcome, revealing a near-zero correlation between them across 423 GitHub repositories, confirming that readiness metrics cannot predict execution success.

Digital libraries curate millions of research software artefacts yet lack scalable infrastructure for assessing whether those artefacts remain executable. Existing automated assessment tools treat static repository completeness -- what a repository contains -- as a proxy for execution success -- whether it runs. We term this the readiness-outcome conflation and present ReproScore, a two-tier framework that explicitly separates reproducibility readiness (RRS) from reproducibility outcome (ROS), combining them into a coverage-adaptive Composite Score (RCS). RRS comprises 26 sub-metrics across five categories; ROS provides execution-based probes when sandbox infrastructure is available; a community rubric externalises weighting priorities as versioned YAML profiles. Evaluated on 423 GitHub repositories from a large-scale ground-truth corpus spanning five failure modes, two complementary findings emerge: the environment category strongly discriminates failure mode, confirming static signals capture meaningful structural differences; yet RRS exhibits near-zero binary success correlation, empirically quantifying the readiness-outcome gap at repository scale. Together, these findings validate the architectural separation as both necessary and non-trivial, positioning ReproScore as scalable infrastructure for reproducibility-aware curation in digital library workflows.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes