CLMar 16, 2022

Morphological Processing of Low-Resource Languages: Where We Are and What's Next

arXiv:2203.08909v1639 citationsh-index: 28
Originality Incremental advance
AI Analysis

This work addresses the problem of enabling morphological processing for low-resource and endangered languages, which is incremental as it builds on existing approaches to push towards fully unsupervised methods.

The paper tackles the challenge of unsupervised morphological processing for low-resource languages, showing that while existing and newly proposed models perform reasonably on paradigm completion from raw text, there is significant room for improvement, with potential to increase language coverage by magnitudes.

Automatic morphological processing can aid downstream natural language processing applications, especially for low-resource languages, and assist language documentation efforts for endangered languages. Having long been multilingual, the field of computational morphology is increasingly moving towards approaches suitable for languages with minimal or no annotated resources. First, we survey recent developments in computational morphology with a focus on low-resource languages. Second, we argue that the field is ready to tackle the logical next challenge: understanding a language's morphology from raw text alone. We perform an empirical study on a truly unsupervised version of the paradigm completion task and show that, while existing state-of-the-art models bridged by two newly proposed models we devise perform reasonably, there is still much room for improvement. The stakes are high: solving this task will increase the language coverage of morphological resources by a number of magnitudes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes