CMOMgen: Complex Multi-Ontology Alignment via Pattern-Guided In-Context Learning
This addresses the challenge of fully integrating related but disjoint ontologies in knowledge graph construction, particularly in biomedical domains, though it appears incremental as it builds on existing in-context learning and retrieval-augmented generation methods.
The paper tackles the problem of complex multi-ontology matching (CMOM), which aligns source entities to composite logical expressions across multiple target ontologies for better semantic integration, and presents CMOMgen, an end-to-end strategy using retrieval-augmented generation and in-context learning that achieves a minimum of 63% F1-score and outperforms baselines in biomedical tasks.
Constructing comprehensive knowledge graphs requires the use of multiple ontologies in order to fully contextualize data into a domain. Ontology matching finds equivalences between concepts interconnecting ontologies and creating a cohesive semantic layer. While the simple pairwise state of the art is well established, simple equivalence mappings cannot provide full semantic integration of related but disjoint ontologies. Complex multi-ontology matching (CMOM) aligns one source entity to composite logical expressions of multiple target entities, establishing more nuanced equivalences and provenance along the ontological hierarchy. We present CMOMgen, the first end-to-end CMOM strategy that generates complete and semantically sound mappings, without establishing any restrictions on the number of target ontologies or entities. Retrieval-Augmented Generation selects relevant classes to compose the mapping and filters matching reference mappings to serve as examples, enhancing In-Context Learning. The strategy was evaluated in three biomedical tasks with partial reference alignments. CMOMgen outperforms baselines in class selection, demonstrating the impact of having a dedicated strategy. Our strategy also achieves a minimum of 63% in F1-score, outperforming all baselines and ablated versions in two out of three tasks and placing second in the third. Furthermore, a manual evaluation of non-reference mappings showed that 46% of the mappings achieve the maximum score, further substantiating its ability to construct semantically sound mappings.