Maurizio Serva

CLOct 25, 2025

Evolution of the lexicon: a probabilistic point of view

Maurizio Serva

The Swadesh approach for determining the temporal separation between two languages relies on the stochastic process of words replacement (when a complete new word emerges to represent a given concept). It is well known that the basic assumptions of the Swadesh approach are often unrealistic due to various contamination phenomena and misjudgments (horizontal transfers, variations over time and space of the replacement rate, incorrect assessments of cognacy relationships, presence of synonyms, and so on). All of this means that the results cannot be completely correct. More importantly, even in the unrealistic case that all basic assumptions are satisfied, simple mathematics places limits on the accuracy of estimating the temporal separation between two languages. These limits, which are purely probabilistic in nature and which are often neglected in lexicostatistical studies, are analyzed in detail in this article. Furthermore, in this work we highlight that the evolution of a language's lexicon is also driven by another stochastic process: gradual lexical modification of words. We show that this process equally also represents a major contribution to the reshaping of the vocabulary of languages over the centuries and we also show, from a purely probabilistic perspective, that taking into account this second random process significantly increases the precision in determining the temporal separation between two languages.

CLFeb 20, 2018

Stability of meanings versus rate of replacement of words: an experimental test

Michele Pasquini, Maurizio Serva

The words of a language are randomly replaced in time by new ones, but it has long been known that words corresponding to some items (meanings) are less frequently replaced than others. Usually, the rate of replacement for a given item is not directly observable, but it is inferred by the estimated stability which, on the contrary, is observable. This idea goes back a long way in the lexicostatistical literature, nevertheless nothing ensures that it gives the correct answer. The family of Romance languages allows for a direct test of the estimated stabilities against the replacement rates since the proto-language (Latin) is known and the replacement rates can be explicitly computed. The output of the test is threefold:first, we prove that the standard approach which tries to infer the replacement rates trough the estimated stabilities is sound; second, we are able to rewrite the fundamental formula of Glottochronology for a non universal replacement rate (a rate which depends on the item); third, we give indisputable evidence that the stability ranking is far from being the same for different families of languages. This last result is also supported by comparison with the Malagasy family of dialects. As a side result we also provide some evidence that Vulgar Latin and not Late Classical Latin is at the root of modern Romance languages.

Maurizio Serva

2 Papers