Multimodal Contrastive Representation Learning in Augmented Biomedical Knowledge Graphs
This work addresses the problem of uncovering valuable biomedical connections, such as drug-disease relations, for researchers and practitioners, though it appears incremental by building on existing methods.
The paper tackles link prediction in biomedical knowledge graphs by introducing a multimodal approach that combines language model embeddings with graph contrastive learning and knowledge graph embeddings, achieving strong generalizability and accurate predictions on datasets like PrimeKG++ and DrugBank.
Biomedical Knowledge Graphs (BKGs) integrate diverse datasets to elucidate complex relationships within the biomedical field. Effective link prediction on these graphs can uncover valuable connections, such as potential novel drug-disease relations. We introduce a novel multimodal approach that unifies embeddings from specialized Language Models (LMs) with Graph Contrastive Learning (GCL) to enhance intra-entity relationships while employing a Knowledge Graph Embedding (KGE) model to capture inter-entity relationships for effective link prediction. To address limitations in existing BKGs, we present PrimeKG++, an enriched knowledge graph incorporating multimodal data, including biological sequences and textual descriptions for each entity type. By combining semantic and relational information in a unified representation, our approach demonstrates strong generalizability, enabling accurate link predictions even for unseen nodes. Experimental results on PrimeKG++ and the DrugBank drug-target interaction dataset demonstrate the effectiveness and robustness of our method across diverse biomedical datasets. Our source code, pre-trained models, and data are publicly available at https://github.com/HySonLab/BioMedKG