CL NEOct 3, 2019

Extracting UMLS Concepts from Medical Text Using General and Domain-Specific Deep Learning Models

Kathleen C. Fraser, Isar Nejadgholi, Berry De Bruijn, Muqun Li, Astha LaPlante, Khaldoun Zine El Abidine

arXiv:1910.01274v11.710 citations

Originality Synthesis-oriented

AI Analysis

This addresses entity recognition for clinical NLP applications, but it is incremental as it applies existing methods to a new dataset.

The paper tackled entity recognition in medical text by applying deep learning models to the new MedMentions dataset, achieving state-of-the-art F1=0.90 on i2b2 2010 but only F1=0.63 on MedMentions.

Entity recognition is a critical first step to a number of clinical NLP applications, such as entity linking and relation extraction. We present the first attempt to apply state-of-the-art entity recognition approaches on a newly released dataset, MedMentions. This dataset contains over 4000 biomedical abstracts, annotated for UMLS semantic types. In comparison to existing datasets, MedMentions contains a far greater number of entity types, and thus represents a more challenging but realistic scenario in a real-world setting. We explore a number of relevant dimensions, including the use of contextual versus non-contextual word embeddings, general versus domain-specific unsupervised pre-training, and different deep learning architectures. We contrast our results against the well-known i2b2 2010 entity recognition dataset, and propose a new method to combine general and domain-specific information. While producing a state-of-the-art result for the i2b2 2010 task (F1 = 0.90), our results on MedMentions are significantly lower (F1 = 0.63), suggesting there is still plenty of opportunity for improvement on this new data.

View on arXiv PDF

Similar