Operationalizing Research Software for Supply Chain Security
This work addresses the need for consistent operationalization in research software supply chain security studies, though it is incremental as it builds on existing methods for taxonomy development and security analysis.
The authors tackled the problem of inconsistent definitions of 'research software' in empirical studies by introducing a taxonomy to standardize scope and operational boundaries, and demonstrated its utility by applying OpenSSF Scorecard to show how security signals vary across taxonomy-defined clusters.
Empirical studies of research software are hard to compare because the literature operationalizes ``research software'' inconsistently. Motivated by the research software supply chain (RSSC) and its security risks, we introduce an RSSC-oriented taxonomy that makes scope and operational boundaries explicit for empirical research software security studies. We conduct a targeted scoping review of recent repository mining and dataset construction studies, extracting each work's definition, inclusion criteria, unit of analysis, and identification heuristics. We synthesize these into a harmonized taxonomy and a mapping that translates prior approaches into shared taxonomy dimensions. We operationalize the taxonomy on a large community-curated corpus from the Research Software Encyclopedia (RSE), producing an annotated dataset, a labeling codebook, and a reproducible labeling pipeline. Finally, we apply OpenSSF Scorecard as a preliminary security analysis to show how repository-centric security signals differ across taxonomy-defined clusters and why taxonomy-aware stratification is necessary for interpreting RSSC security measurements.