CLSep 20, 2022

Register Variation Remains Stable Across 60 Languages

arXiv:2209.09813v111 citationsh-index: 15
Originality Incremental advance
AI Analysis

This work addresses the fundamental question of linguistic universality for researchers in linguistics and computational linguistics, providing empirical evidence for a stable relationship between context and language features.

The paper tackled the problem of whether register variation is universal across languages by comparing linguistic features in tweets and Wikipedia articles across 60 languages, and found that register variation is indeed universal, confirming the hypothesis.

This paper measures the stability of cross-linguistic register variation. A register is a variety of a language that is associated with extra-linguistic context. The relationship between a register and its context is functional: the linguistic features that make up a register are motivated by the needs and constraints of the communicative situation. This view hypothesizes that register should be universal, so that we expect a stable relationship between the extra-linguistic context that defines a register and the sets of linguistic features which the register contains. In this paper, the universality and robustness of register variation is tested by comparing variation within vs. between register-specific corpora in 60 languages using corpora produced in comparable communicative situations: tweets and Wikipedia articles. Our findings confirm the prediction that register variation is, in fact, universal.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes