CLMay 20, 2023

Revisiting Entropy Rate Constancy in Text

arXiv:2305.12084v2137 citations
Originality Synthesis-oriented
AI Analysis

This challenges foundational linguistic theories of efficient communication, potentially impacting research in psycholinguistics and natural language processing.

The paper re-evaluates the entropy rate constancy principle, a key support for the uniform information density hypothesis in human language, using neural language models and finds no clear evidence for it across various datasets, model sizes, and languages.

The uniform information density (UID) hypothesis states that humans tend to distribute information roughly evenly across an utterance or discourse. Early evidence in support of the UID hypothesis came from Genzel & Charniak (2002), which proposed an entropy rate constancy principle based on the probability of English text under n-gram language models. We re-evaluate the claims of Genzel & Charniak (2002) with neural language models, failing to find clear evidence in support of entropy rate constancy. We conduct a range of experiments across datasets, model sizes, and languages and discuss implications for the uniform information density hypothesis and linguistic theories of efficient communication more broadly.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes