CLCOJan 4, 2014

Properties of phoneme N -grams across the world's language families

arXiv:1401.0794v1
Originality Synthesis-oriented
AI Analysis

This work addresses linguistic research by analyzing cross-language patterns, but it is incremental as it applies existing statistical methods to new data.

The study examined whether phoneme N-gram distributions across half of the world's languages follow a power law, finding that these distributions correlate with language family sizes, with correlation improving as N increases, and applied statistical tests to confirm power law fits in twelve datasets.

In this article, we investigate the properties of phoneme N-grams across half of the world's languages. We investigate if the sizes of three different N-gram distributions of the world's language families obey a power law. Further, the N-gram distributions of language families parallel the sizes of the families, which seem to obey a power law distribution. The correlation between N-gram distributions and language family sizes improves with increasing values of N. We applied statistical tests, originally given by physicists, to test the hypothesis of power law fit to twelve different datasets. The study also raises some new questions about the use of N-gram distributions in linguistic research, which we answer by running a statistical test.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes