CLNov 5, 2020

CODER: Knowledge infused cross-lingual medical term embedding for term normalization

Zheng Yuan, Zhengyun Zhao, Haixia Sun, Jiao Li, Fei Wang, Sheng Yu

arXiv:2011.02947v35.1129 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the challenge of standardizing medical terminology for cross-lingual applications, representing an incremental improvement with specific gains in biomedical NLP.

The paper tackles the problem of medical term normalization across languages by proposing CODER, a contrastive learning method on knowledge graphs, which outperforms state-of-the-art embeddings in benchmarks like zero-shot term normalization and semantic similarity.

This paper proposes CODER: contrastive learning on knowledge graphs for cross-lingual medical term representation. CODER is designed for medical term normalization by providing close vector representations for different terms that represent the same or similar medical concepts with cross-lingual support. We train CODER via contrastive learning on a medical knowledge graph (KG) named the Unified Medical Language System, where similarities are calculated utilizing both terms and relation triplets from KG. Training with relations injects medical knowledge into embeddings and aims to provide potentially better machine learning features. We evaluate CODER in zero-shot term normalization, semantic similarity, and relation classification benchmarks, which show that CODERoutperforms various state-of-the-art biomedical word embedding, concept embeddings, and contextual embeddings. Our codes and models are available at https://github.com/GanjinZero/CODER.

View on arXiv PDF Code

Similar