AI CLJun 24, 2020

Benchmark and Best Practices for Biomedical Knowledge Graph Embeddings

David Chang, Ivana Balazevic, Carl Allen, Daniel Chawla, Cynthia Brandt, Richard Andrew Taylor

arXiv:2006.13774v160.51007 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work addresses the need for reliable methods to leverage biomedical knowledge graphs for machine learning applications, offering a domain-specific benchmark and best practices.

The paper tackled the problem of learning biomedical concept embeddings from knowledge graphs, which has been lacking despite advances in NLP, by training state-of-the-art knowledge graph embedding models on SNOMED-CT and providing a benchmark with comparisons to existing methods.

Much of biomedical and healthcare data is encoded in discrete, symbolic form such as text and medical codes. There is a wealth of expert-curated biomedical domain knowledge stored in knowledge bases and ontologies, but the lack of reliable methods for learning knowledge representation has limited their usefulness in machine learning applications. While text-based representation learning has significantly improved in recent years through advances in natural language processing, attempts to learn biomedical concept embeddings so far have been lacking. A recent family of models called knowledge graph embeddings have shown promising results on general domain knowledge graphs, and we explore their capabilities in the biomedical domain. We train several state-of-the-art knowledge graph embedding models on the SNOMED-CT knowledge graph, provide a benchmark with comparison to existing methods and in-depth discussion on best practices, and make a case for the importance of leveraging the multi-relational nature of knowledge graphs for learning biomedical knowledge representation. The embeddings, code, and materials will be made available to the communitY.

View on arXiv PDF Code

Similar