Patterns of Persistence and Diffusibility across the World's Languages
This work addresses the challenge of distinguishing between genetic relatedness, areal contact, universality, and chance in language similarities, providing a resource for further research in multilingual NLP and comparative linguistics, though it is incremental as it builds on existing hypotheses and methods.
The authors tackled the problem of understanding the linguistic causes of cross-lingual similarities in colexification and phonology by analyzing genealogical stability and contact-induced change, constructing a large-scale graph resource for 1,966 languages and testing hypotheses, with results strongly supporting one established hypothesis and contradicting another.
Language similarities can be caused by genetic relatedness, areal contact, universality, or chance. Colexification, i.e. a type of similarity where a single lexical form is used to convey multiple meanings, is underexplored. In our work, we shed light on the linguistic causes of cross-lingual similarity in colexification and phonology, by exploring genealogical stability (persistence) and contact-induced change (diffusibility). We construct large-scale graphs incorporating semantic, genealogical, phonological and geographical data for 1,966 languages. We then show the potential of this resource, by investigating several established hypotheses from previous work in linguistics, while proposing new ones. Our results strongly support a previously established hypothesis in the linguistic literature, while offering contradicting evidence to another. Our large scale resource opens for further research across disciplines, e.g.~in multilingual NLP and comparative linguistics.