CL ITApr 18, 2021

Linguistic Dependencies and Statistical Dependence

Jacob Louis Hoover, Alessandro Sordoni, Wenyu Du, Timothy J. O'Donnell

arXiv:2104.08685v330.8664 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses a foundational question in NLP and cognitive science about the relationship between statistical and linguistic dependencies, though it is incremental as it builds on prior methods with improved contextualized estimates.

The paper investigates whether statistical dependence between words, measured using contextualized pointwise mutual information (CPMI) from large pretrained language models, correlates with linguistic dependencies, finding that CPMI achieves an unlabelled undirected attachment score of about 0.5, which is above chance and non-contextualized baselines but comparable to simple adjacency baselines.

Are pairs of words that tend to occur together also likely to stand in a linguistic dependency? This empirical question is motivated by a long history of literature in cognitive science, psycholinguistics, and NLP. In this work we contribute an extensive analysis of the relationship between linguistic dependencies and statistical dependence between words. Improving on previous work, we introduce the use of large pretrained language models to compute contextualized estimates of the pointwise mutual information between words (CPMI). For multiple models and languages, we extract dependency trees which maximize CPMI, and compare to gold standard linguistic dependencies. Overall, we find that CPMI dependencies achieve an unlabelled undirected attachment score of at most $\approx 0.5$. While far above chance, and consistently above a non-contextualized PMI baseline, this score is generally comparable to a simple baseline formed by connecting adjacent words. We analyze which kinds of linguistic dependencies are best captured in CPMI dependencies, and also find marked differences between the estimates of the large pretrained language models, illustrating how their different training schemes affect the type of dependencies they capture.

View on arXiv PDF Code

Similar