CLMar 3

The Distribution of Phoneme Frequencies across the World's Languages: Macroscopic and Microscopic Information-Theoretic Models

arXiv:2603.02860v11 citationsh-index: 2
Originality Incremental advance
AI Analysis

This work addresses a fundamental problem in linguistics by offering a comprehensive model for phoneme frequencies, which is incremental as it builds on existing information-theoretic approaches.

The study tackled the problem of explaining phoneme frequency distributions across languages, finding that macroscopic patterns follow a symmetric Dirichlet distribution with a scaling concentration parameter and microscopic patterns are predicted by a Maximum Entropy model incorporating articulatory, phonotactic, and lexical constraints, providing a unified information-theoretic account.

We demonstrate that the frequency distribution of phonemes across languages can be explained at both macroscopic and microscopic levels. Macroscopically, phoneme rank-frequency distributions closely follow the order statistics of a symmetric Dirichlet distribution whose single concentration parameter scales systematically with phonemic inventory size, revealing a robust compensation effect whereby larger inventories exhibit lower relative entropy. Microscopically, a Maximum Entropy model incorporating constraints from articulatory, phonotactic, and lexical structure accurately predicts language-specific phoneme probabilities. Together, these findings provide a unified information-theoretic account of phoneme frequency structure.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes