MECLMLNov 26, 2025

Zipf Distributions from Two-Stage Symbolic Processes: Stability Under Stochastic Lexical Filtering

arXiv:2511.21060v12 citationsh-index: 1
Originality Incremental advance
AI Analysis

This addresses a foundational problem in linguistics and complex systems by offering a non-communicative explanation for Zipf's law, though it is incremental as it builds on existing geometric models.

The study tackled the origin of Zipf's law in language by proposing a geometric model without linguistic elements, showing that Zipf-like behavior arises from symbolic processes, with simulations matching data from English, Russian, and mixed-genre sources.

Zipf's law in language lacks a definitive origin, debated across fields. This study explains Zipf-like behavior using geometric mechanisms without linguistic elements. The Full Combinatorial Word Model (FCWM) forms words from a finite alphabet, generating a geometric distribution of word lengths. Interacting exponential forces yield a power-law rank-frequency curve, determined by alphabet size and blank symbol probability. Simulations support predictions, matching English, Russian, and mixed-genre data. The symbolic model suggests Zipf-type laws arise from geometric constraints, not communicative efficiency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes