CLSep 27, 2018

Building a Lemmatizer and a Spell-checker for Sorani Kurdish

arXiv:1809.10763v118 citations
Originality Synthesis-oriented
AI Analysis

This provides foundational text processing tools for Sorani Kurdish, enabling further NLP research in this under-resourced language, though it is incremental as it applies existing methods to a new domain.

The paper tackles the problem of lemmatization and spell-checking for Sorani Kurdish by developing hybrid tools based on morphological rules and n-gram models, achieving 86.7% accuracy for lemmatization and up to 96.4% accuracy for spell-checking.

The present paper aims at presenting a lemmatization and a word-level error correction system for Sorani Kurdish. We propose a hybrid approach based on the morphological rules and a n-gram language model. We have called our lemmatization and error correction systems Peyv and Rênûs respectively, which are the first tools presented for Sorani Kurdish to the best of our knowledge. The Peyv lemmatizer has shown 86.7% accuracy. As for Rênûs, using a lexicon, we have obtained 96.4% accuracy while without a lexicon, the correction system has 87% accuracy. As two fundamental text processing tools, these tools can pave the way for further researches on more natural language processing applications for Sorani Kurdish.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes