CLMar 6, 2025

Quantifying patterns of punctuation in modern Chinese prose

arXiv:2503.04449v13 citationsh-index: 32Chaos
Originality Synthesis-oriented
AI Analysis

This work incrementally extends universal punctuation pattern analysis to Chinese literature, supporting cross-linguistic applicability.

The study analyzed punctuation patterns in three contemporary Chinese literary works and found that they follow Zipf's law and a Weibull distribution similar to Western texts, though with less frequent large spacing and more variability in sentence length.

Recent research shows that punctuation patterns in texts exhibit universal features across languages. Analysis of Western classical literature reveals that the distribution of spaces between punctuation marks aligns with a discrete Weibull distribution, typically used in survival analysis. By extending this analysis to Chinese literature represented here by three notable contemporary works, it is shown that Zipf's law applies to Chinese texts similarly to Western texts, where punctuation patterns also improve adherence to the law. Additionally, the distance distribution between punctuation marks in Chinese texts follows the Weibull model, though larger spacing is less frequent than in English translations. Sentence-ending punctuation, representing sentence length, diverges more from this pattern, reflecting greater flexibility in sentence length. This variability supports the formation of complex, multifractal sentence structures, particularly evident in Gao Xingjian's "Soul Mountain". These findings demonstrate that both Chinese and Western texts share universal punctuation and word distribution patterns, underscoring their broad applicability across languages.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes