CLAug 13, 2018

Comparing morphological complexity of Spanish, Otomi and Nahuatl

arXiv:1808.04314v11089 citations
Originality Synthesis-oriented
AI Analysis

This work addresses linguistic analysis for low-resource languages, but it is incremental as it applies existing methods to new data.

The study compared morphological complexity in Spanish, Otomí, and Nahuatl using parallel corpora, finding that a language can be complex in generating many word forms but simpler in word structure predictability.

We use two small parallel corpora for comparing the morphological complexity of Spanish, Otomi and Nahuatl. These are languages that belong to different linguistic families, the latter are low-resourced. We take into account two quantitative criteria, on one hand the distribution of types over tokens in a corpus, on the other, perplexity and entropy as indicators of word structure predictability. We show that a language can be complex in terms of how many different morphological word forms can produce, however, it may be less complex in terms of predictability of its internal structure of words.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes