CY AI CL LGJun 3, 2022

Measuring Gender Bias in Word Embeddings of Gendered Languages Requires Disentangling Grammatical Gender Signals

arXiv:2206.01691v15.118 citationsh-index: 25Has Code

Originality Incremental advance

AI Analysis

This addresses the issue of inaccurate gender bias assessments in NLP for gendered languages, which is incremental as it builds on existing bias measurement methods.

The paper tackled the problem of grammatical gender interfering with social gender bias measurements in word embeddings of gendered languages, by introducing post-processing methods to disentangle these signals, resulting in a significant reduction in grammatical gender effect size (e.g., average d = 1.3 for French, German, and Italian) and improved congruence with implicit bias measurements.

Does the grammatical gender of a language interfere when measuring the semantic gender information captured by its word embeddings? A number of anomalous gender bias measurements in the embeddings of gendered languages suggest this possibility. We demonstrate that word embeddings learn the association between a noun and its grammatical gender in grammatically gendered languages, which can skew social gender bias measurements. Consequently, word embedding post-processing methods are introduced to quantify, disentangle, and evaluate grammatical gender signals. The evaluation is performed on five gendered languages from the Germanic, Romance, and Slavic branches of the Indo-European language family. Our method reduces the strength of grammatical gender signals, which is measured in terms of effect size (Cohen's d), by a significant average of d = 1.3 for French, German, and Italian, and d = 0.56 for Polish and Spanish. Once grammatical gender is disentangled, the association between over 90% of 10,000 inanimate nouns and their assigned grammatical gender weakens, and cross-lingual bias results from the Word Embedding Association Test (WEAT) become more congruent with country-level implicit bias measurements. The results further suggest that disentangling grammatical gender signals from word embeddings may lead to improvement in semantic machine learning tasks.

View on arXiv PDF Code

Similar