CLApr 11, 2019

Modeling Global Syntactic Variation in English Using Dialect Classification

arXiv:1904.05527v131.11092 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of understanding syntactic differences across English dialects for linguists and NLP researchers, though it appears incremental by building on existing dialect classification methods.

The paper tackled the problem of studying global syntactic variation in English by evaluating dialect identification for 14 national varieties, using data-driven language mapping, a large set of syntactic features from grammar induction, and comparing models across web and social media corpora to measure robustness.

This paper evaluates global-scale dialect identification for 14 national varieties of English as a means for studying syntactic variation. The paper makes three main contributions: (i) introducing data-driven language mapping as a method for selecting the inventory of national varieties to include in the task; (ii) producing a large and dynamic set of syntactic features using grammar induction rather than focusing on a few hand-selected features such as function words; and (iii) comparing models across both web corpora and social media corpora in order to measure the robustness of syntactic variation across registers.

View on arXiv PDF

Similar