AIONER: All-in-one scheme-based biomedical named entity recognition using deep learning
This addresses the problem of limited generalizability and single-entity focus in BioNER for biomedical researchers and text mining applications, though it appears incremental as it builds on existing deep learning methods.
The paper tackles data scarcity and overfitting in biomedical named entity recognition (BioNER) by proposing an all-in-one scheme that uses external annotated resources, resulting in a tool (AIONER) that shows effectiveness and robustness on 14 benchmark tasks and handles unseen entity types and large-scale text like PubMed data.
Biomedical named entity recognition (BioNER) seeks to automatically recognize biomedical entities in natural language text, serving as a necessary foundation for downstream text mining tasks and applications such as information extraction and question answering. Manually labeling training data for the BioNER task is costly, however, due to the significant domain expertise required for accurate annotation. The resulting data scarcity causes current BioNER approaches to be prone to overfitting, to suffer from limited generalizability, and to address a single entity type at a time (e.g., gene or disease). We therefore propose a novel all-in-one (AIO) scheme that uses external data from existing annotated resources to enhance the accuracy and stability of BioNER models. We further present AIONER, a general-purpose BioNER tool based on cutting-edge deep learning and our AIO schema. We evaluate AIONER on 14 BioNER benchmark tasks and show that AIONER is effective, robust, and compares favorably to other state-of-the-art approaches such as multi-task learning. We further demonstrate the practical utility of AIONER in three independent tasks to recognize entity types not previously seen in training data, as well as the advantages of AIONER over existing methods for processing biomedical text at a large scale (e.g., the entire PubMed data).