Word class flexibility: A deep contextualized approach
This work addresses the challenge of accurately measuring word class flexibility for linguistic typology, offering a scalable method that reveals cross-linguistic patterns.
The authors tackled the problem of quantifying word class flexibility across languages by using contextualized word embeddings to measure semantic shifts between grammatical categories, applying it to 37 languages and finding that flexible lemmas show greater semantic variation in their dominant word class, supporting a directional view of flexibility.
Word class flexibility refers to the phenomenon whereby a single word form is used across different grammatical categories. Extensive work in linguistic typology has sought to characterize word class flexibility across languages, but quantifying this phenomenon accurately and at scale has been fraught with difficulties. We propose a principled methodology to explore regularity in word class flexibility. Our method builds on recent work in contextualized word embeddings to quantify semantic shift between word classes (e.g., noun-to-verb, verb-to-noun), and we apply this method to 37 languages. We find that contextualized embeddings not only capture human judgment of class variation within words in English, but also uncover shared tendencies in class flexibility across languages. Specifically, we find greater semantic variation when flexible lemmas are used in their dominant word class, supporting the view that word class flexibility is a directional process. Our work highlights the utility of deep contextualized models in linguistic typology.