CLNCSep 18, 2021

Dependency distance minimization predicts compression

arXiv:2109.08900v2648 citations
Originality Synthesis-oriented
AI Analysis

This work addresses a second-order prediction in linguistics, linking syntax and word structure, but is incremental as it builds on established principles and uses existing data.

The study tested the theoretical prediction that dependency distance minimization (DDm) implies word length compression, using parallel treebanks and a new scoring method. It found the prediction confirmed when measuring word length in phonemes but not in syllables, and highlighted limitations of traditional dependency distance measures.

Dependency distance minimization (DDm) is a well-established principle of word order. It has been predicted theoretically that DDm implies compression, namely the minimization of word lengths. This is a second order prediction because it links a principle with another principle, rather than a principle and a manifestation as in a first order prediction. Here we test that second order prediction with a parallel collection of treebanks controlling for annotation style with Universal Dependencies and Surface-Syntactic Universal Dependencies. To test it, we use a recently introduced score that has many mathematical and statistical advantages with respect to the widely used sum of dependency distances. We find that the prediction is confirmed by the new score when word lengths are measured in phonemes, independently of the annotation style, but not when word lengths are measured in syllables. In contrast, one of the most widely used scores, i.e. the sum of dependency distances, fails to confirm that prediction, showing the weakness of raw dependency distances for research on word order. Finally, our findings expand the theory of natural communication by linking two distinct levels of organization, namely syntax (word order) and word internal structure.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes