CLFeb 20, 2019

ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing

arXiv:1902.07669v31225 citations
Originality Synthesis-oriented
AI Analysis

This addresses the lack of robust, publicly available NLP models for biomedical text processing, which is critical for applications in healthcare and science, though it is incremental as it builds on existing spaCy library.

The paper tackles the problem of poor performance of NLP models under domain shift, specifically for biomedical and clinical text, by introducing scispaCy, a tool that provides robust, practical models for biomedical text processing, demonstrating their robustness across several tasks and datasets.

Despite recent advances in natural language processing, many statistical models for processing text perform extremely poorly under domain shift. Processing biomedical and clinical text is a critically important application area of natural language processing, for which there are few robust, practical, publicly available models. This paper describes scispaCy, a new tool for practical biomedical/scientific text processing, which heavily leverages the spaCy library. We detail the performance of two packages of models released in scispaCy and demonstrate their robustness on several tasks and datasets. Models and code are available at https://allenai.github.io/scispacy/

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes