CLDec 21, 2020

Domain specific BERT representation for Named Entity Recognition of lab protocol

arXiv:2012.11145v1
AI Analysis

This work provides an incremental improvement for NER in the domain of medical lab protocols, which is important for researchers and practitioners working with medical text data.

The paper addresses the challenge of Named Entity Recognition (NER) in medical lab protocols, where specialized vocabulary hinders traditional BERT models. Their Bio-BERT-based system achieved the fourth-highest F1 score and the second-highest Recall, trailing the best F1 score by 2.21 points.

Supervised models trained to predict properties from representations have been achieving high accuracy on a variety of tasks. For instance, the BERT family seems to work exceptionally well on the downstream task from NER tagging to the range of other linguistic tasks. But the vocabulary used in the medical field contains a lot of different tokens used only in the medical industry such as the name of different diseases, devices, organisms, medicines, etc. that makes it difficult for traditional BERT model to create contextualized embedding. In this paper, we are going to illustrate the System for Named Entity Tagging based on Bio-Bert. Experimental results show that our model gives substantial improvements over the baseline and stood the fourth runner up in terms of F1 score, and first runner up in terms of Recall with just 2.21 F1 score behind the best one.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes