CLApr 14, 2021

Detecting Cross-Geographic Biases in Toxicity Modeling on Social Media

arXiv:2104.06999v2665 citations
Originality Incremental advance
AI Analysis

This addresses biases in content moderation for marginalized groups, particularly in non-Western regions, though it is incremental as it builds on existing bias detection methods.

The paper tackles the problem of geographic biases in toxicity detection models on social media, which often perform poorly for non-Western contexts, and introduces a weakly supervised method that identifies cross-geographic error groups validated by human judgments.

Online social media platforms increasingly rely on Natural Language Processing (NLP) techniques to detect abusive content at scale in order to mitigate the harms it causes to their users. However, these techniques suffer from various sampling and association biases present in training data, often resulting in sub-par performance on content relevant to marginalized groups, potentially furthering disproportionate harms towards them. Studies on such biases so far have focused on only a handful of axes of disparities and subgroups that have annotations/lexicons available. Consequently, biases concerning non-Western contexts are largely ignored in the literature. In this paper, we introduce a weakly supervised method to robustly detect lexical biases in broader geocultural contexts. Through a case study on a publicly available toxicity detection model, we demonstrate that our method identifies salient groups of cross-geographic errors, and, in a follow up, demonstrate that these groupings reflect human judgments of offensive and inoffensive language in those geographic contexts. We also conduct analysis of a model trained on a dataset with ground truth labels to better understand these biases, and present preliminary mitigation experiments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes