CLFLApr 9, 2020

FST Morphology for the Endangered Skolt Sami Language

arXiv:2004.04803v1999 citations
AI Analysis

This work addresses the difficulty in NLP for Skolt Sami, a severely endangered minority language, by providing a foundational tool for its revitalization, though it is incremental as it applies existing FST methods to new data.

The paper tackles the lack of morphological analysis tools for the endangered Skolt Sami language by developing a finite-state transducer (FST)-based analyzer and generator, achieving coverage of over 30,000 words across 148 inflectional paradigms and 12 derivational forms.

We present advances in the development of a FST-based morphological analyzer and generator for Skolt Sami. Like other minority Uralic languages, Skolt Sami exhibits a rich morphology, on the one hand, and there is little golden standard material for it, on the other. This makes NLP approaches for its study difficult without a solid morphological analysis. The language is severely endangered and the work presented in this paper forms a part of a greater whole in its revitalization efforts. Furthermore, we intersperse our description with facilitation and description practices not well documented in the infrastructure. Currently, the analyzer covers over 30,000 Skolt Sami words in 148 inflectional paradigms and over 12 derivational forms.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes