CL AI LGNov 12, 2020

Biomedical Named Entity Recognition at Scale

arXiv:2011.06315v14.085 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of scalable and efficient NER for biomedical applications, providing a production-ready solution that can handle large datasets and multiple languages.

The paper tackled biomedical named entity recognition by reimplementing a Bi-LSTM-CNN-Char architecture on Apache Spark, achieving new state-of-the-art results on seven public benchmarks, including gains of 4.1% on BC4CHEMD to 93.72%, 4.6% on Species800 to 80.91%, and 5.2% on JNLPBA to 81.29%.

Named entity recognition (NER) is a widely applicable natural language processing task and building block of question answering, topic modeling, information retrieval, etc. In the medical domain, NER plays a crucial role by extracting meaningful chunks from clinical notes and reports, which are then fed to downstream tasks like assertion status detection, entity resolution, relation extraction, and de-identification. Reimplementing a Bi-LSTM-CNN-Char deep learning architecture on top of Apache Spark, we present a single trainable NER model that obtains new state-of-the-art results on seven public biomedical benchmarks without using heavy contextual embeddings like BERT. This includes improving BC4CHEMD to 93.72% (4.1% gain), Species800 to 80.91% (4.6% gain), and JNLPBA to 81.29% (5.2% gain). In addition, this model is freely available within a production-grade code base as part of the open-source Spark NLP library; can scale up for training and inference in any Spark cluster; has GPU support and libraries for popular programming languages such as Python, R, Scala and Java; and can be extended to support other human languages with no code changes.

View on arXiv PDF

Similar