CLJun 20, 2022

Fewer Errors, but More Stereotypes? The Effect of Model Size on Gender Bias

arXiv:2206.09860v1638 citationsh-index: 31
Originality Incremental advance
AI Analysis

This work addresses the problem of increasing social biases in large NLP models for researchers and practitioners, highlighting incremental risks as models scale.

The study investigated the relationship between model size and gender bias in masked language models, finding that larger models exhibit higher bias scores in direct prompting but make fewer gender errors in downstream tasks, though their mistakes are more likely due to gender bias and become more stereotypical with size.

The size of pretrained models is increasing, and so is their performance on a variety of NLP tasks. However, as their memorization capacity grows, they might pick up more social biases. In this work, we examine the connection between model size and its gender bias (specifically, occupational gender bias). We measure bias in three masked language model families (RoBERTa, DeBERTa, and T5) in two setups: directly using prompt based method, and using a downstream task (Winogender). We find on the one hand that larger models receive higher bias scores on the former task, but when evaluated on the latter, they make fewer gender errors. To examine these potentially conflicting results, we carefully investigate the behavior of the different models on Winogender. We find that while larger models outperform smaller ones, the probability that their mistakes are caused by gender bias is higher. Moreover, we find that the proportion of stereotypical errors compared to anti-stereotypical ones grows with the model size. Our findings highlight the potential risks that can arise from increasing model size.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes