Multiperiodic Processes: Ergodic Sources with a Sublinear Entropy
This work provides a theoretical model for understanding entropy properties in processes like language, though it is incremental in nature.
The authors introduced multiperiodic processes as stationary ergodic processes with vanishing entropy rates, demonstrating that under specific parameterizations, these processes exhibit Zipf's law for relative frequencies and Hilberg's law for block entropy growth, which is relevant to statistical language models.
We construct multiperiodic processes -- a simple example of stationary ergodic (but not mixing) processes over natural numbers that enjoy the vanishing entropy rate under a mild condition. Multiperiodic processes are supported on randomly shifted deterministic sequences called multiperiodic sequences, which can be efficiently generated using an algorithm called the Infinite Clock. Under a suitable parameterization, multiperiodic sequences exhibit relative frequencies of particular numbers given by Zipf's law. Exactly in the same setting, the respective multiperiodic processes satisfy an asymptotic power-law growth of block entropy, called Hilberg's law. Hilberg's law is deemed to hold for statistical language models, in particular.