CLOct 16, 2012

Gender identity and lexical variation in social media

arXiv:1210.4567v2439 citations
AI Analysis

This research addresses the problem of oversimplifying gender in computational linguistics by providing a nuanced, data-driven perspective on gendered language variation for social scientists and NLP researchers.

The study analyzed 14,000 Twitter users to examine how gender identity relates to linguistic style and social networks, finding that individuals with language defying gender norms have fewer same-gender social connections and that social network homophily correlates with gender-specific language use.

We present a study of the relationship between gender, linguistic style, and social networks, using a novel corpus of 14,000 Twitter users. Prior quantitative work on gender often treats this social variable as a female/male binary; we argue for a more nuanced approach. By clustering Twitter users, we find a natural decomposition of the dataset into various styles and topical interests. Many clusters have strong gender orientations, but their use of linguistic resources sometimes directly conflicts with the population-level language statistics. We view these clusters as a more accurate reflection of the multifaceted nature of gendered language styles. Previous corpus-based work has also had little to say about individuals whose linguistic styles defy population-level gender patterns. To identify such individuals, we train a statistical classifier, and measure the classifier confidence for each individual in the dataset. Examining individuals whose language does not match the classifier's model for their gender, we find that they have social networks that include significantly fewer same-gender social connections and that, in general, social network homophily is correlated with the use of same-gender language markers. Pairing computational methods and social theory thus offers a new perspective on how gender emerges as individuals position themselves relative to audiences, topics, and mainstream gender norms.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes