CLDec 21, 2020

Domain specific BERT representation for Named Entity Recognition of lab protocol

arXiv:2012.11145v1

AI Analysis

This work provides an incremental improvement for NER in the domain of medical lab protocols, which is important for researchers and practitioners working with medical text data.

The paper addresses the challenge of Named Entity Recognition (NER) in medical lab protocols, where specialized vocabulary hinders traditional BERT models. Their Bio-BERT-based system achieved the fourth-highest F1 score and the second-highest Recall, trailing the best F1 score by 2.21 points.

Supervised models trained to predict properties from representations have been achieving high accuracy on a variety of tasks. For instance, the BERT family seems to work exceptionally well on the downstream task from NER tagging to the range of other linguistic tasks. But the vocabulary used in the medical field contains a lot of different tokens used only in the medical industry such as the name of different diseases, devices, organisms, medicines, etc. that makes it difficult for traditional BERT model to create contextualized embedding. In this paper, we are going to illustrate the System for Named Entity Tagging based on Bio-Bert. Experimental results show that our model gives substantial improvements over the baseline and stood the fourth runner up in terms of F1 score, and first runner up in terms of Recall with just 2.21 F1 score behind the best one.

View on arXiv PDF

Similar