CLAICEApr 29, 2018

OPA2Vec: combining formal and informal content of biomedical ontologies to improve similarity-based prediction

arXiv:1804.10922v1156 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the need for better semantic similarity measures in biomedical ontology analysis, offering a tool that can enhance data annotation and integration tasks, though it is incremental as it builds on existing Word2Vec and ontology methods.

The paper tackled the problem of underutilizing ontology meta-data in biomedical similarity predictions by proposing OPA2Vec, a method that combines formal axioms and annotations to generate vector representations, resulting in improved predictions for protein-protein interactions and gene-disease associations with concrete performance metrics.

Motivation: Ontologies are widely used in biology for data annotation, integration, and analysis. In addition to formally structured axioms, ontologies contain meta-data in the form of annotation axioms which provide valuable pieces of information that characterize ontology classes. Annotations commonly used in ontologies include class labels, descriptions, or synonyms. Despite being a rich source of semantic information, the ontology meta-data are generally unexploited by ontology-based analysis methods such as semantic similarity measures. Results: We propose a novel method, OPA2Vec, to generate vector representations of biological entities in ontologies by combining formal ontology axioms and annotation axioms from the ontology meta-data. We apply a Word2Vec model that has been pre-trained on PubMed abstracts to produce feature vectors from our collected data. We validate our method in two different ways: first, we use the obtained vector representations of proteins as a similarity measure to predict protein-protein interaction (PPI) on two different datasets. Second, we evaluate our method on predicting gene-disease associations based on phenotype similarity by generating vector representations of genes and diseases using a phenotype ontology, and applying the obtained vectors to predict gene-disease associations. These two experiments are just an illustration of the possible applications of our method. OPA2Vec can be used to produce vector representations of any biomedical entity given any type of biomedical ontology. Availability: https://github.com/bio-ontology-research-group/opa2vec Contact: robert.hoehndorf@kaust.edu.sa and xin.gao@kaust.edu.sa.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes