CLIRLGMLDec 5, 2019

Design and implementation of an open source Greek POS Tagger and Entity Recognizer using spaCy

arXiv:1912.10162v141 citations
Originality Synthesis-oriented
AI Analysis

This provides NLP tools for Greek, a less-resourced language, though it's incremental as it adapts existing methods to new data.

The authors developed an open-source Greek part-of-speech tagger and named entity recognizer using spaCy, achieving state-of-the-art performance in POS tagging and extending entity types beyond the standard ENAMEX categories.

This paper proposes a machine learning approach to part-of-speech tagging and named entity recognition for Greek, focusing on the extraction of morphological features and classification of tokens into a small set of classes for named entities. The architecture model that was used is introduced. The greek version of the spaCy platform was added into the source code, a feature that did not exist before our contribution, and was used for building the models. Additionally, a part of speech tagger was trained that can detect the morphology of the tokens and performs higher than the state-of-the-art results when classifying only the part of speech. For named entity recognition using spaCy, a model that extends the standard ENAMEX type (organization, location, person) was built. Certain experiments that were conducted indicate the need for flexibility in out-of-vocabulary words and there is an effort for resolving this issue. Finally, the evaluation results are discussed.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes