CLDec 10, 2014

Statistical Patterns in Written Language

arXiv:1412.3336v231 citations
Originality Synthesis-oriented
AI Analysis

It addresses the problem of understanding language structure beyond traditional linguistics for researchers in complex systems, but is incremental as it reviews existing work.

The paper reviews recent contributions in quantitative linguistics that use statistical and information theory techniques to uncover medium- and long-range patterns in written language, offering a new perspective on its complex organization.

Quantitative linguistics has been allowed, in the last few decades, within the admittedly blurry boundaries of the field of complex systems. A growing host of applied mathematicians and statistical physicists devote their efforts to disclose regularities, correlations, patterns, and structural properties of language streams, using techniques borrowed from statistics and information theory. Overall, results can still be categorized as modest, but the prospects are promising: medium- and long-range features in the organization of human language -which are beyond the scope of traditional linguistics- have already emerged from this kind of analysis and continue to be reported, contributing a new perspective to our understanding of this most complex communication system. This short book is intended to review some of these recent contributions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes