CLFeb 7, 2021

Word frequency-rank relationship in tagged texts

arXiv:2102.10992v20.2

Originality Synthesis-oriented

AI Analysis

This research provides insights into the linguistic features associated with grammatical function for computational linguists and natural language processing researchers, incrementally advancing the understanding of word distribution.

This paper analyzes the frequency-rank relationship of words within sub-vocabularies of nouns, verbs, and other grammatical classes in English literary works. It finds statistically significant differences between these classes compared to a null hypothesis of uniform distribution, suggesting that frequency-rank relationships reflect linguistic features tied to grammatical function.

We analyze the frequency-rank relationship in sub-vocabularies corresponding to three different grammatical classes (nouns, verbs, and others) in a collection of literary works in English, whose words have been automatically tagged according to their grammatical role. Comparing with a null hypothesis which assumes that words belonging to each class are uniformly distributed across the frequency-ranked vocabulary of the whole work, we disclose statistically significant differences between the three classes. This results point to the fact that frequency-rank relationships may reflect linguistic features associated with grammatical function.

View on arXiv PDF

Similar