CLSISOC-PHOct 18, 2012

Diffusion of Lexical Change in Social Media

arXiv:1210.5268v4267 citations
Originality Synthesis-oriented
AI Analysis

This research addresses how social media influences language evolution, revealing it reinforces existing demographic divides rather than creating a unified dialect, which is incremental as it supports prior arguments with new data.

The study analyzed 107 million tweets to investigate how language changes spread across the U.S., finding that demographic similarity, particularly in race, plays a more central role than geographical proximity or population size in sharing linguistic influence.

Computer-mediated communication is driving fundamental changes in the nature of written language. We investigate these changes by statistical analysis of a dataset comprising 107 million Twitter messages (authored by 2.7 million unique user accounts). Using a latent vector autoregressive model to aggregate across thousands of words, we identify high-level patterns in diffusion of linguistic change over the United States. Our model is robust to unpredictable changes in Twitter's sampling rate, and provides a probabilistic characterization of the relationship of macro-scale linguistic influence to a set of demographic and geographic predictors. The results of this analysis offer support for prior arguments that focus on geographical proximity and population size. However, demographic similarity -- especially with regard to race -- plays an even more central role, as cities with similar racial demographics are far more likely to share linguistic influence. Rather than moving towards a single unified "netspeak" dialect, language evolution in computer-mediated communication reproduces existing fault lines in spoken American English.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes