Caio Seguin

5.9ITApr 19, 2015

Compression and the origins of Zipf's law of abbreviation

R. Ferrer-i-Cancho, C. Bentz, C. Seguin

Languages across the world exhibit Zipf's law of abbreviation, namely more frequent words tend to be shorter. The generalized version of the law - an inverse relationship between the frequency of a unit and its magnitude - holds also for the behaviours of other species and the genetic code. The apparent universality of this pattern in human language and its ubiquity in other domains calls for a theoretical understanding of its origins. To this end, we generalize the information theoretic concept of mean code length as a mean energetic cost function over the probability and the magnitude of the types of the repertoire. We show that the minimization of that cost function and a negative correlation between probability and the magnitude of types are intimately related.

Caio Seguin

1 Paper