CLJan 26, 2021

Spark NLP: Natural Language Understanding at Scale

arXiv:2101.10848v176 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of scalable NLP for enterprise users, particularly in healthcare, by offering a widely used library that integrates seamlessly into distributed environments.

The authors tackled the challenge of scaling natural language processing by developing Spark NLP, a library built on Apache Spark ML that provides performant and accurate NLP annotations with over 1100 pre-trained models in 192 languages, resulting in over 2.7 million downloads and adoption by 54% of healthcare organizations.

Spark NLP is a Natural Language Processing (NLP) library built on top of Apache Spark ML. It provides simple, performant and accurate NLP annotations for machine learning pipelines that can scale easily in a distributed environment. Spark NLP comes with 1100 pre trained pipelines and models in more than 192 languages. It supports nearly all the NLP tasks and modules that can be used seamlessly in a cluster. Downloaded more than 2.7 million times and experiencing nine times growth since January 2020, Spark NLP is used by 54% of healthcare organizations as the worlds most widely used NLP library in the enterprise.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes