CL AI LGSep 26, 2021

Feature-Rich Named Entity Recognition for Bulgarian Using Conditional Random Fields

Georgi Georgiev, Preslav Nakov, Kuzman Ganchev, Petya Osenova, Kiril Ivanov Simov

arXiv:2109.15121v135.91083 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of named entity recognition for Bulgarian, an under-resourced language, but is incremental as it applies existing methods to new data.

The paper tackled named entity recognition for Bulgarian news text by combining established and language-specific features, achieving an F1 score of 89.4%, comparable to English state-of-the-art results.

The paper presents a feature-rich approach to the automatic recognition and categorization of named entities (persons, organizations, locations, and miscellaneous) in news text for Bulgarian. We combine well-established features used for other languages with language-specific lexical, syntactic and morphological information. In particular, we make use of the rich tagset annotation of the BulTreeBank (680 morpho-syntactic tags), from which we derive suitable task-specific tagsets (local and nonlocal). We further add domain-specific gazetteers and additional unlabeled data, achieving F1=89.4%, which is comparable to the state-of-the-art results for English.

View on arXiv PDF

Similar