CLApr 3, 2021

Multi-Unit Directional Measures of Association: Moving Beyond Pairs of Words

arXiv:2104.01297v15 citations
Originality Incremental advance
AI Analysis

This work expands corpus-based association methods for linguistics and NLP by generalizing across varying sequence lengths and representations, though it is incremental as it builds on existing pairwise measures.

The paper tackles the problem of quantifying directional association in sequences beyond word pairs by formulating multi-unit measures, and finds that these measures are stable across eight languages and provide unique rankings of associated sequences.

This paper formulates and evaluates a series of multi-unit measures of directional association, building on the pairwise ΔP measure, that are able to quantify association in sequences of varying length and type of representation. Multi-unit measures face an additional segmentation problem: once the implicit length constraint of pairwise measures is abandoned, association measures must also identify the borders of meaningful sequences. This paper takes a vector-based approach to the segmentation problem by using 18 unique measures to describe different aspects of multi-unit association. An examination of these measures across eight languages shows that they are stable across languages and that each provides a unique rank of associated sequences. Taken together, these measures expand corpus-based approaches to association by generalizing across varying lengths and types of representation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes