CLJan 28, 2024

UnMASKed: Quantifying Gender Biases in Masked Language Models through Linguistically Informed Job Market Prompts

arXiv:2401.15798v1104 citationsh-index: 1EACL
Originality Incremental advance
AI Analysis

This work addresses the problem of societal biases in language models for users and developers in AI ethics, though it is incremental as it builds on existing bias detection methods.

The study quantified gender biases in six masked language models by analyzing their responses to linguistically informed job market prompts, finding that all models exhibited stereotypical gender alignments, with multilingual variants showing comparatively reduced biases.

Language models (LMs) have become pivotal in the realm of technological advancements. While their capabilities are vast and transformative, they often include societal biases encoded in the human-produced datasets used for their training. This research delves into the inherent biases present in masked language models (MLMs), with a specific focus on gender biases. This study evaluated six prominent models: BERT, RoBERTa, DistilBERT, BERT-multilingual, XLM-RoBERTa, and DistilBERT-multilingual. The methodology employed a novel dataset, bifurcated into two subsets: one containing prompts that encouraged models to generate subject pronouns in English, and the other requiring models to return the probabilities of verbs, adverbs, and adjectives linked to the prompts' gender pronouns. The analysis reveals stereotypical gender alignment of all models, with multilingual variants showing comparatively reduced biases.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes