Empirical observations of ultraslow diffusion driven by the fractional dynamics in languages: Dynamical statistical properties of word counts of already popular words
This provides empirical evidence for a theoretical phenomenon in language dynamics, which is incremental but addresses a gap in real-world observations.
The paper tackles the empirical observation of ultraslow diffusion in language data by analyzing word counts from newspapers, blogs, and Wikipedia, finding that it is explained by a random walk model with power-law forgetting (exponent β≈0.5), which reproduces key statistical properties like mean-squared displacement and power spectrum density.
Ultraslow diffusion (i.e. logarithmic diffusion) has been extensively studied theoretically, but has hardly been observed empirically. In this paper, firstly, we find the ultraslow-like diffusion of the time-series of word counts of already popular words by analysing three different nationwide language databases: (i) newspaper articles (Japanese), (ii) blog articles (Japanese), and (iii) page views of Wikipedia (English, French, Chinese, and Japanese). Secondly, we use theoretical analysis to show that this diffusion is basically explained by the random walk model with the power-law forgetting with the exponent $β\approx 0.5$, which is related to the fractional Langevin equation. The exponent $β$ characterises the speed of forgetting and $β\approx 0.5$ corresponds to (i) the border (or thresholds) between the stationary and the nonstationary and (ii) the right-in-the-middle dynamics between the IID noise for $β=1$ and the normal random walk for $β=0$. Thirdly, the generative model of the time-series of word counts of already popular words, which is a kind of Poisson process with the Poisson parameter sampled by the above-mentioned random walk model, can almost reproduce not only the empirical mean-squared displacement but also the power spectrum density and the probability density function.