CLJun 7, 2020

Medical Concept Normalization in User Generated Texts by Learning Target Concept Embeddings

Katikapalli Subramanyam Kalyan, S. Sangeetha

arXiv:2006.04014v10.59 citations

Originality Incremental advance

AI Analysis

This addresses the problem of mapping health-related text to standard concepts for medical informatics, with incremental improvements over existing methods.

The paper tackles medical concept normalization by jointly learning representations of input concept mentions and target concepts, improving accuracy by up to 2.31% on three standard datasets.

Medical concept normalization helps in discovering standard concepts in free-form text i.e., maps health-related mentions to standard concepts in a vocabulary. It is much beyond simple string matching and requires a deep semantic understanding of concept mentions. Recent research approach concept normalization as either text classification or text matching. The main drawback in existing a) text classification approaches is ignoring valuable target concepts information in learning input concept mention representation b) text matching approach is the need to separately generate target concept embeddings which is time and resource consuming. Our proposed model overcomes these drawbacks by jointly learning the representations of input concept mention and target concepts. First, it learns the input concept mention representation using RoBERTa. Second, it finds cosine similarity between embeddings of input concept mention and all the target concepts. Here, embeddings of target concepts are randomly initialized and then updated during training. Finally, the target concept with maximum cosine similarity is assigned to the input concept mention. Our model surpasses all the existing methods across three standard datasets by improving accuracy up to 2.31%.

View on arXiv PDF

Similar