OntoMerger: An Ontology Integration Library for Deduplicating and Connecting Knowledge Graph Nodes
This addresses the issue of data redundancy and connectivity for researchers and practitioners building knowledge graphs, though it appears incremental as it builds on existing ontology integration concepts.
The paper tackles the problem of node duplication in knowledge graphs from heterogeneous datasets by introducing OntoMerger, a Python library that merges nodes with the same meaning and connects hierarchies, as demonstrated on a real-world biomedical KG.
Duplication of nodes is a common problem encountered when building knowledge graphs (KGs) from heterogeneous datasets, where it is crucial to be able to merge nodes having the same meaning. OntoMerger is a Python ontology integration library whose functionality is to deduplicate KG nodes. Our approach takes a set of KG nodes, mappings and disconnected hierarchies and generates a set of merged nodes together with a connected hierarchy. In addition, the library provides analytic and data testing functionalities that can be used to fine-tune the inputs, further reducing duplication, and to increase connectivity of the output graph. OntoMerger can be applied to a wide variety of ontologies and KGs. In this paper we introduce OntoMerger and illustrate its functionality on a real-world biomedical KG.