Automatic annotation of bioinformatics workflows with biomedical ontologies
This addresses the difficulty in finding, sharing, and reusing workflows for the bioinformatics community, though it is incremental as it applies an existing method to new data.
The paper tackled the problem of scarce and unstructured descriptions in legacy bioinformatics workflows by automatically annotating them with ontology terms, resulting in the annotation of 530 workflows and over 2600 services with quality comparable to manual curation.
Legacy scientific workflows, and the services within them, often present scarce and unstructured (i.e. textual) descriptions. This makes it difficult to find, share and reuse them, thus dramatically reducing their value to the community. This paper presents an approach to annotating workflows and their subcomponents with ontology terms, in an attempt to describe these artifacts in a structured way. Despite a dearth of even textual descriptions, we automatically annotated 530 myExperiment bioinformatics-related workflows, including more than 2600 workflow-associated services, with relevant ontological terms. Quantitative evaluation of the Information Content of these terms suggests that, in cases where annotation was possible at all, the annotation quality was comparable to manually curated bioinformatics resources.