BioLORD: Learning Ontological Representations from Definitions (for Biomedical Concepts and their Textual Descriptions)
This work addresses the issue of poor semantic alignment in biomedical AI for researchers and practitioners, offering an incremental improvement over existing contrastive learning methods.
The paper tackled the problem of non-semantic representations in biomedical concept and text similarity by introducing BioLORD, a pre-training strategy that grounds concept representations using definitions and descriptions from ontologies, resulting in new state-of-the-art performance on MedSTS and MayoSRS benchmarks.
This work introduces BioLORD, a new pre-training strategy for producing meaningful representations for clinical sentences and biomedical concepts. State-of-the-art methodologies operate by maximizing the similarity in representation of names referring to the same concept, and preventing collapse through contrastive learning. However, because biomedical names are not always self-explanatory, it sometimes results in non-semantic representations. BioLORD overcomes this issue by grounding its concept representations using definitions, as well as short descriptions derived from a multi-relational knowledge graph consisting of biomedical ontologies. Thanks to this grounding, our model produces more semantic concept representations that match more closely the hierarchical structure of ontologies. BioLORD establishes a new state of the art for text similarity on both clinical sentences (MedSTS) and biomedical concepts (MayoSRS).