CL SOC-PH APMar 1, 2015

Variation of word frequencies in Russian literary texts

arXiv:1503.00339v21.13 citations

Originality Synthesis-oriented

AI Analysis

This research addresses the problem of understanding linguistic patterns in text analysis for computational linguistics and natural language processing, but it is incremental as it applies existing statistical models to a specific dataset.

The study analyzed word frequency variation in Russian literary texts, finding that the standard deviation of a word's frequency follows a power law with exponent 0.62, indicating that rarer words exhibit greater volatility or 'burstiness'.

We study the variation of word frequencies in Russian literary texts. Our findings indicate that the standard deviation of a word's frequency across texts depends on its average frequency according to a power law with exponent $0.62,$ showing that the rarer words have a relatively larger degree of frequency volatility (i.e., "burstiness"). Several latent factors models have been estimated to investigate the structure of the word frequency distribution. The dependence of a word's frequency volatility on its average frequency can be explained by the asymmetry in the distribution of latent factors.

View on arXiv PDF

Similar