CLApr 3, 2021

Global Syntactic Variation in Seven Languages: Towards a Computational Dialectology

arXiv:2104.01306v130 citations
Originality Incremental advance
AI Analysis

This work addresses the need for generalized models of regional variation in computational sociolinguistics, which is incremental as it builds on existing methods to handle multiple languages and global data.

The paper tackled the problem of representing global regional linguistic variation by removing constraints in dialectology, using Computational Construction Grammar and global language mapping across seven languages, and found that models using Construction Grammars predicted region-of-origin more robustly than simpler syntactic features.

The goal of this paper is to provide a complete representation of regional linguistic variation on a global scale. To this end, the paper focuses on removing three constraints that have previously limited work within dialectology/dialectometry. First, rather than assuming a fixed and incomplete set of variants, we use Computational Construction Grammar to provide a replicable and falsifiable set of syntactic features. Second, rather than assuming a specific area of interest, we use global language mapping based on web-crawled and social media datasets to determine the selection of national varieties. Third, rather than looking at a single language in isolation, we model seven major languages together using the same methods: Arabic, English, French, German, Portuguese, Russian, and Spanish. Results show that models for each language are able to robustly predict the region-of-origin of held-out samples better using Construction Grammars than using simpler syntactic features. These global-scale experiments are used to argue that new methods in computational sociolinguistics are able to provide more generalized models of regional variation that are essential for understanding language variation and change at scale.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes