CLAug 24, 2023

Probabilistic Method of Measuring Linguistic Productivity

arXiv:2308.12643v11 citationsh-index: 3
Originality Incremental advance
AI Analysis

This work addresses the challenge of objectively quantifying linguistic productivity for linguists and computational linguists, offering a novel approach but is incremental in refining existing measurement techniques.

The paper tackles the problem of measuring linguistic productivity by proposing a probabilistic method that assesses an affix's ability to form new words, independent of token frequency, and evaluates it on English and Russian data, revealing that productivity increases in two stages: first with high-frequency items, then low-frequency ones.

In this paper I propose a new way of measuring linguistic productivity that objectively assesses the ability of an affix to be used to coin new complex words and, unlike other popular measures, is not directly dependent upon token frequency. Specifically, I suggest that linguistic productivity may be viewed as the probability of an affix to combine with a random base. The advantages of this approach include the following. First, token frequency does not dominate the productivity measure but naturally influences the sampling of bases. Second, we are not just counting attested word types with an affix but rather simulating the construction of these types and then checking whether they are attested in the corpus. Third, a corpus-based approach and randomised design assure that true neologisms and words coined long ago have equal chances to be selected. The proposed algorithm is evaluated both on English and Russian data. The obtained results provide some valuable insights into the relation of linguistic productivity to the number of types and tokens. It looks like burgeoning linguistic productivity manifests itself in an increasing number of types. However, this process unfolds in two stages: first comes the increase in high-frequency items, and only then follows the increase in low-frequency items.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes