Hierarchical Retrieval with Out-Of-Vocabulary Queries: A Case Study on SNOMED CT
For biomedical ontology users, this work addresses the practical challenge of retrieving concepts when queries lack exact matches, with a method that generalizes to other ontologies.
The paper tackles hierarchical concept retrieval from SNOMED CT with out-of-vocabulary (OOV) queries, proposing a method using language model-based ontology embeddings in hyperbolic space. Their approach outperforms SBERT, SapBERT, and lexical matching baselines on three new datasets.
SNOMED CT is a biomedical ontology with a hierarchical representation, modelling terminological concepts at a large scale. Knowledge retrieval in SNOMED CT is critical for its application but often proves challenging due to linguistic ambiguity, synonymy, polysemy, and so on. This problem is exacerbated when the queries are out-of-vocabulary (OOV), i.e., lacking any equivalent matches in the ontology. In this work, we focus on the problem of hierarchical concept retrieval from SNOMED CT with OOV queries, and propose an approach driven by utilising language model-based ontology embeddings, which represent hierarchical concepts in a hyperbolic space for enabling efficient subsumption inference between a textual query and an arbitrary concept. For evaluation, we construct three datasets where OOV queries are annotated against SNOMED CT concepts, testing the retrieval of the most specific subsumers and their less relevant ancestors. We find that our method outperforms the baselines, including SBERT, SapBERT, and two lexical matching methods. While evaluated against SNOMED CT, the approach is generalisable and can be extended to other ontologies. We release all the experiment codes and datasets at https://github.com/jonathondilworth/HR-OOV-SNOMED-CT.