Information-theoretic Interestingness Measures for Cross-Ontology Data Mining
This work addresses the need for efficient methods to discover high-quality relationships in cross-ontology data mining for bioinformatics, but it is incremental as it builds on existing metrics and applies to a specific domain.
The authors tackled the problem of mining relationships across bio-ontologies by developing a data mining method with ontology-guided generalization and a new information-theoretic interestingness metric, applying it to mouse gene expression data as a proof of concept and comparing it to existing metrics.
Community annotation of biological entities with concepts from multiple bio-ontologies has created large and growing repositories of ontology-based annotation data with embedded implicit relationships among orthogonal ontologies. Development of efficient data mining methods and metrics to mine and assess the quality of the mined relationships has not kept pace with the growth of annotation data. In this study, we present a data mining method that uses ontology-guided generalization to discover relationships across ontologies along with a new interestingness metric based on information theory. We apply our data mining algorithm and interestingness measures to datasets from the Gene Expression Database at the Mouse Genome Informatics as a preliminary proof of concept to mine relationships between developmental stages in the mouse anatomy ontology and Gene Ontology concepts (biological process, molecular function and cellular component). In addition, we present a comparison of our interestingness metric to four existing metrics. Ontology-based annotation datasets provide a valuable resource for discovery of relationships across ontologies. The use of efficient data mining methods and appropriate interestingness metrics enables the identification of high quality relationships.