The Harmonic Structure of Information Contours
This work addresses the challenge of understanding linguistic organization for researchers in computational linguistics and cognitive science, offering a novel framework for analyzing structural pressures, though it is incremental in extending the uniform information density hypothesis.
The study tackled the problem of explaining fluctuations in information rate in language by proposing that they are influenced by periodic patterns, and found consistent evidence of such periodicity across six languages, with dominant frequencies aligning with discourse structure.
The uniform information density (UID) hypothesis proposes that speakers aim to distribute information evenly throughout a text, balancing production effort and listener comprehension difficulty. However, language typically does not maintain a strictly uniform information rate; instead, it fluctuates around a global average. These fluctuations are often explained by factors such as syntactic constraints, stylistic choices, or audience design. In this work, we explore an alternative perspective: that these fluctuations may be influenced by an implicit linguistic pressure towards periodicity, where the information rate oscillates at regular intervals, potentially across multiple frequencies simultaneously. We apply harmonic regression and introduce a novel extension called time scaling to detect and test for such periodicity in information contours. Analyzing texts in English, Spanish, German, Dutch, Basque, and Brazilian Portuguese, we find consistent evidence of periodic patterns in information rate. Many dominant frequencies align with discourse structure, suggesting these oscillations reflect meaningful linguistic organization. Beyond highlighting the connection between information rate and discourse structure, our approach offers a general framework for uncovering structural pressures at various levels of linguistic granularity.