Polysemy and brevity versus frequency in language
This work provides incremental validation of linguistic laws for multiple languages, relevant for linguists and computational linguists studying word properties.
The study tested the robustness of Zipf's meaning-frequency law (more frequent words tend to be more polysemous) and law of abbreviation (more frequent words tend to be shorter) across English, Dutch, and Spanish, finding that both laws hold overall in all analyzed languages.
The pioneering research of G. K. Zipf on the relationship between word frequency and other word features led to the formulation of various linguistic laws. The most popular is Zipf's law for word frequencies. Here we focus on two laws that have been studied less intensively: the meaning-frequency law, i.e. the tendency of more frequent words to be more polysemous, and the law of abbreviation, i.e. the tendency of more frequent words to be shorter. In a previous work, we tested the robustness of these Zipfian laws for English, roughly measuring word length in number of characters and distinguishing adult from child speech. In the present article, we extend our study to other languages (Dutch and Spanish) and introduce two additional measures of length: syllabic length and phonemic length. Our correlation analysis indicates that both the meaning-frequency law and the law of abbreviation hold overall in all the analyzed languages.