CLMay 22, 2023

Language-Agnostic Bias Detection in Language Models with Bias Probing

arXiv:2305.13302v2133 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the problem of detecting social biases in language models for NLP researchers and practitioners, offering a more reliable method than existing approaches, though it is incremental in improving bias quantification.

The authors tackled the challenge of quantifying social biases in pretrained language models by proposing LABDet, a robust and language-agnostic bias probing technique, and demonstrated consistent nationality bias patterns across six languages and a correlation with pretraining data for English BERT.

Pretrained language models (PLMs) are key components in NLP, but they contain strong social biases. Quantifying these biases is challenging because current methods focusing on fill-the-mask objectives are sensitive to slight changes in input. To address this, we propose a bias probing technique called LABDet, for evaluating social bias in PLMs with a robust and language-agnostic method. For nationality as a case study, we show that LABDet `surfaces' nationality bias by training a classifier on top of a frozen PLM on non-nationality sentiment detection. We find consistent patterns of nationality bias across monolingual PLMs in six languages that align with historical and political context. We also show for English BERT that bias surfaced by LABDet correlates well with bias in the pretraining data; thus, our work is one of the few studies that directly links pretraining data to PLM behavior. Finally, we verify LABDet's reliability and applicability to different templates and languages through an extensive set of robustness checks. We publicly share our code and dataset in https://github.com/akoksal/LABDet.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes