CLJan 23

Relating Word Embedding Gender Biases to Gender Gaps: A Cross-Cultural Analysis

Scott Friedman, Sonja Schmer-Galunder, Anthony Chen, Jeffrey Rye

arXiv:2601.17203v11096 citationsh-index: 7

Originality Incremental advance

AI Analysis

This work addresses the issue of understanding cultural gender gaps through NLP biases for researchers and policymakers, though it is incremental as it builds on existing bias quantification methods.

The paper tackles the problem of quantifying gender bias in word embeddings and uses it to characterize statistical gender gaps in areas like education and economics, validating the approach on Twitter data from 51 U.S. regions and 99 countries and correlating biases with 23 gender gap metrics.

Modern models for common NLP tasks often employ machine learning techniques and train on journalistic, social media, or other culturally-derived text. These have recently been scrutinized for racial and gender biases, rooting from inherent bias in their training text. These biases are often sub-optimal and recent work poses methods to rectify them; however, these biases may shed light on actual racial or gender gaps in the culture(s) that produced the training text, thereby helping us understand cultural context through big data. This paper presents an approach for quantifying gender bias in word embeddings, and then using them to characterize statistical gender gaps in education, politics, economics, and health. We validate these metrics on 2018 Twitter data spanning 51 U.S. regions and 99 countries. We correlate state and country word embedding biases with 18 international and 5 U.S.-based statistical gender gaps, characterizing regularities and predictive strength.

View on arXiv PDF

Similar