CLJan 14, 2013

SpeedRead: A Fast Named Entity Recognition Pipeline

arXiv:1301.2857v122 citations

Originality Incremental advance

AI Analysis

This addresses the problem of slow entity extraction for large-scale online content analysis, though it appears incremental as it builds on existing methods.

The authors tackled the computational cost of named entity recognition on web-scale text by developing SpeedRead, a pipeline that runs at least 10 times faster than the Stanford NLP pipeline.

Online content analysis employs algorithmic methods to identify entities in unstructured text. Both machine learning and knowledge-base approaches lie at the foundation of contemporary named entities extraction systems. However, the progress in deploying these approaches on web-scale has been been hampered by the computational cost of NLP over massive text corpora. We present SpeedRead (SR), a named entity recognition pipeline that runs at least 10 times faster than Stanford NLP pipeline. This pipeline consists of a high performance Penn Treebank- compliant tokenizer, close to state-of-art part-of-speech (POS) tagger and knowledge-based named entity recognizer.

View on arXiv PDF

Similar