ITLGSTDec 15, 2025

From Zipf's Law to Neural Scaling through Heaps' Law and Hilberg's Hypothesis

arXiv:2512.13491v23 citations
Originality Synthesis-oriented
AI Analysis

This work provides a theoretical foundation linking statistical laws in linguistics to scaling behaviors in machine learning, which is incremental as it formalizes existing connections.

The paper demonstrates that the neural scaling law, which describes how model performance scales with training data and compute, can be derived from Zipf's law, a statistical property of token distributions, under certain assumptions, using a deductive chain through Heaps' law and Hilberg's hypothesis, illustrated with a toy example.

We inspect the deductive connection between the neural scaling law and Zipf's law -- two statements discussed in machine learning and quantitative linguistics. The neural scaling law describes how the cross entropy rate of a foundation model -- such as a large language model -- changes with respect to the amount of training tokens, parameters, and compute. By contrast, Zipf's law posits that the distribution of tokens exhibits a power law tail. Whereas similar claims have been made in more specific settings, we show that the neural scaling law is a consequence of Zipf's law under certain broad assumptions that we reveal systematically. The derivation steps are as follows: We derive Heaps' law on the vocabulary growth from Zipf's law, Hilberg's hypothesis on the entropy scaling from Heaps' law, and the neural scaling from Hilberg's hypothesis. We illustrate these inference steps by a toy example of the Santa Fe process that satisfies all the four statistical laws.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes