CLApr 29, 2020

Morphological Disambiguation of South Sámi with FSTs and Neural Networks

arXiv:2004.14062v1
AI Analysis

This addresses the problem of limited linguistic resources for endangered languages like South Sámi, offering a transferable approach, though it is incremental in applying existing techniques to a new context.

The paper tackled morphological disambiguation for the endangered South Sámi language by using an FST-based analyzer and a Bi-RNN model trained on North Sámi data and synthetic South Sámi data, achieving a method that requires minimal resources.

We present a method for conducting morphological disambiguation for South Sámi, which is an endangered language. Our method uses an FST-based morphological analyzer to produce an ambiguous set of morphological readings for each word in a sentence. These readings are disambiguated with a Bi-RNN model trained on the related North Sámi UD Treebank and some synthetically generated South Sámi data. The disambiguation is done on the level of morphological tags ignoring word forms and lemmas; this makes it possible to use North Sámi training data for South Sámi without the need for a bilingual dictionary or aligned word embeddings. Our approach requires only minimal resources for South Sámi, which makes it usable and applicable in the contexts of any other endangered language as well.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes