CLFLSep 14, 2021

Hunspell for Sorani Kurdish Spell Checking and Morphological Analysis

arXiv:2109.06374v18 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This work addresses a gap in language technology for Sorani Kurdish speakers and researchers, though it is incremental as it applies an existing method (Hunspell) to new data.

The authors tackled the lack of open-source spell checking and morphological analysis tools for Sorani Kurdish, a less-resourced language, by building a morphological analyzer, stemmer, and spell-checking system using Hunspell. They annotated a lexicon with morphosyntactic tags and extracted morphological rules to create this publicly available implementation.

Spell checking and morphological analysis are two fundamental tasks in text and natural language processing and are addressed in the early stages of the development of language technology. Despite the previous efforts, there is no progress in open-source to create such tools for Sorani Kurdish, also known as Central Kurdish, as a less-resourced language. In this paper, we present our efforts in annotating a lexicon with morphosyntactic tags and also, extracting morphological rules of Sorani Kurdish to build a morphological analyzer, a stemmer and a spell-checking system using Hunspell. This implementation can be used for further developments in the field by researchers and also, be integrated into text editors under a publicly available license.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes