Launching into clinical space with medspaCy: a new clinical text processing toolkit in Python
This toolkit addresses the problem of clinical text processing for researchers and developers, but it is incremental as it builds on the existing spaCy framework.
The authors tackled the need for flexible clinical NLP tools by introducing medspaCy, an open-source toolkit based on spaCy that integrates rule-based and machine learning methods, resulting in a library with components for context analysis and terminology mapping to support rapid pipeline development.
Despite impressive success of machine learning algorithms in clinical natural language processing (cNLP), rule-based approaches still have a prominent role. In this paper, we introduce medspaCy, an extensible, open-source cNLP library based on spaCy framework that allows flexible integration of rule-based and machine learning-based algorithms adapted to clinical text. MedspaCy includes a variety of components that meet common cNLP needs such as context analysis and mapping to standard terminologies. By utilizing spaCy's clear and easy-to-use conventions, medspaCy enables development of custom pipelines that integrate easily with other spaCy-based modules. Our toolkit includes several core components and facilitates rapid development of pipelines for clinical text.