Register Variation Remains Stable Across 60 Languages
This work addresses the fundamental question of linguistic universality for researchers in linguistics and computational linguistics, providing empirical evidence for a stable relationship between context and language features.
The paper tackled the problem of whether register variation is universal across languages by comparing linguistic features in tweets and Wikipedia articles across 60 languages, and found that register variation is indeed universal, confirming the hypothesis.
This paper measures the stability of cross-linguistic register variation. A register is a variety of a language that is associated with extra-linguistic context. The relationship between a register and its context is functional: the linguistic features that make up a register are motivated by the needs and constraints of the communicative situation. This view hypothesizes that register should be universal, so that we expect a stable relationship between the extra-linguistic context that defines a register and the sets of linguistic features which the register contains. In this paper, the universality and robustness of register variation is tested by comparing variation within vs. between register-specific corpora in 60 languages using corpora produced in comparable communicative situations: tweets and Wikipedia articles. Our findings confirm the prediction that register variation is, in fact, universal.