CLMar 9

Supporting Workflow Reproducibility by Linking Bioinformatics Tools across Papers and Executable Code

arXiv:2603.08195v185.2Has Code
Predicted impact top 8% in CL · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses the problem of improving workflow reproducibility and understanding for bioinformatics researchers by bridging the gap between narrative descriptions and workflow implementations, which is an incremental improvement.

This paper introduces CoPaLink, an automated approach to link bioinformatics tools mentioned in research papers with their corresponding implementations in executable workflow code. It achieves a joint accuracy of 66% on Nextflow workflows, with individual F1-measures ranging from 84% to 89% for named entity recognition and entity linking.

Motivation: The rapid growth of biological data has intensified the need for transparent, reproducible, and well-documented computational workflows. The ability to clearly connect the steps of a workflow in the code with their description in a paper would improve workflow understanding, support reproducibility, and facilitate reuse. This task requires the linking of Bioinformatics tools in workflow code with their mentions in a published workflow description. Results: We present CoPaLink, an automated approach that integrates three components: Named Entity Recognition (NER) for identifying tool mentions in scientific text, NER for tool mentions in workflow code, and entity linking grounded on Bioinformatics knowledge bases. We propose approaches for all three steps achieving a high individual F1-measure (84 - 89) and a joint accuracy of 66 when evaluated on Nextflow workflows using Bioconda and Bioweb Knowledge bases. CoPaLink leverages corpora of scientific articles and workflow executable code with curated tool annotations to bridge the gap between narrative descriptions and workflow implementations. Availability: The code is available at https://gitlab.liris.cnrs.fr/sharefair/copalink-experiments and https://gitlab.liris.cnrs.fr/sharefair/copalink. The corpora are also available at https://doi.org/10.5281/zenodo.18526700, https://doi.org/10.5281/zenodo.18526760 and https://doi.org/10.5281/zenodo.18543814.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes