Robust Named Entity Recognition in Idiosyncratic Domains
This addresses the issue for tasks like entity linking and relation extraction in domain-specific texts, but it appears incremental as it builds on existing deep learning methods.
The paper tackles the problem of named entity recognition failing in idiosyncratic domains, proposing a robust approach that achieves F1 scores of 84-94% on standard datasets.
Named entity recognition often fails in idiosyncratic domains. That causes a problem for depending tasks, such as entity linking and relation extraction. We propose a generic and robust approach for high-recall named entity recognition. Our approach is easy to train and offers strong generalization over diverse domain-specific language, such as news documents (e.g. Reuters) or biomedical text (e.g. Medline). Our approach is based on deep contextual sequence learning and utilizes stacked bidirectional LSTM networks. Our model is trained with only few hundred labeled sentences and does not rely on further external knowledge. We report from our results F1 scores in the range of 84-94% on standard datasets.