Nefnir: A high accuracy lemmatizer for Icelandic
This provides a high-accuracy tool for NLP tasks in Icelandic, addressing a specific need for this language, but it is incremental as it builds on existing rule-based methods.
The authors tackled the problem of lemmatization for Icelandic, a morphologically rich language, by developing Nefnir, an open-source lemmatizer that uses suffix substitution rules derived from a morphological database, achieving accuracies of 99.55% on correctly tagged text and 96.88% on text tagged with a PoS tagger.
Lemmatization, finding the basic morphological form of a word in a corpus, is an important step in many natural language processing tasks when working with morphologically rich languages. We describe and evaluate Nefnir, a new open source lemmatizer for Icelandic. Nefnir uses suffix substitution rules, derived from a large morphological database, to lemmatize tagged text. Evaluation shows that for correctly tagged text, Nefnir obtains an accuracy of 99.55%, and for text tagged with a PoS tagger, the accuracy obtained is 96.88%.