CLFeb 21, 2024

Measuring Social Biases in Masked Language Models by Proxy of Prediction Quality

arXiv:2402.13954v34 citationsh-index: 1ACL

Originality Incremental advance

AI Analysis

This addresses the problem of detecting harmful social biases in language models for AI fairness researchers, though it's incremental on existing bias measurement work.

The researchers measured social biases in masked language models by analyzing prediction quality through iterative masking experiments, finding all models encode concerning biases. They showed their proposed proxy functions produce more accurate bias estimations after retraining than existing methods, particularly for biases toward disadvantaged groups.

Transformer language models have achieved state-of-the-art performance for a variety of natural language tasks but have been shown to encode unwanted biases. We evaluate the social biases encoded by transformers trained with the masked language modeling objective using proposed proxy functions within an iterative masking experiment to measure the quality of transformer models' predictions and assess the preference of MLMs towards disadvantaged and advantaged groups. We find all models encode concerning social biases. We compare bias estimations with those produced by other evaluation methods using benchmark datasets and assess their alignment with human annotated biases. We extend previous work by evaluating social biases introduced after retraining an MLM under the masked language modeling objective and find proposed measures produce more accurate and sensitive estimations of biases introduced by retraining MLMs based on relative preference for biased sentences between models, while other methods tend to underestimate biases after retraining on sentences biased towards disadvantaged groups.

View on arXiv PDF

Similar