CLJul 12, 2022

The expected sum of edge lengths in planar linearizations of trees. Theory and applications

arXiv:2207.05564v42 citationsh-index: 31
Originality Incremental advance
AI Analysis

This work addresses the need for more flexible random baselines in dependency distance minimization studies for computational linguistics, offering incremental improvements over existing projective baselines.

The paper tackles the problem of establishing random baselines for dependency tree edge lengths in natural language sentences, focusing on planar permutations as a weaker constraint than projective orderings. It provides an O(n)-time algorithm to compute the expected sum of edge lengths under planarity and shows that the gap between actual dependency distance and the baseline decreases with stronger formal constraints.

Dependency trees have proven to be a very successful model to represent the syntactic structure of sentences of human languages. In these structures, vertices are words and edges connect syntactically-dependent words. The tendency of these dependencies to be short has been demonstrated using random baselines for the sum of the lengths of the edges or its variants. A ubiquitous baseline is the expected sum in projective orderings (wherein edges do not cross and the root word of the sentence is not covered by any edge), that can be computed in time $O(n)$. Here we focus on a weaker formal constraint, namely planarity. In the theoretical domain, we present a characterization of planarity that, given a sentence, yields either the number of planar permutations or an efficient algorithm to generate uniformly random planar permutations of the words. We also show the relationship between the expected sum in planar arrangements and the expected sum in projective arrangements. In the domain of applications, we derive a $O(n)$-time algorithm to calculate the expected value of the sum of edge lengths. We also apply this research to a parallel corpus and find that the gap between actual dependency distance and the random baseline reduces as the strength of the formal constraint on dependency structures increases, suggesting that formal constraints absorb part of the dependency distance minimization effect. Our research paves the way for replicating past research on dependency distance minimization using random planar linearizations as random baseline.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes