Compression and the origins of Zipf's law for word frequencies

arXiv:1605.01326v244 citations
Originality Highly original
AI Analysis

This work addresses a foundational problem in linguistics and information theory by offering a compact theory for statistical laws of language.

The paper tackles the problem of explaining Zipf's law for word frequencies by proposing a new derivation based on optimal coding, suggesting it originates from cognitive pressures for efficient communication.

Here we sketch a new derivation of Zipf's law for word frequencies based on optimal coding. The structure of the derivation is reminiscent of Mandelbrot's random typing model but it has multiple advantages over random typing: (1) it starts from realistic cognitive pressures (2) it does not require fine tuning of parameters and (3) it sheds light on the origins of other statistical laws of language and thus can lead to a compact theory of linguistic laws. Our findings suggest that the recurrence of Zipf's law in human languages could originate from pressure for easy and fast communication.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes