Argument from Old Man's View: Assessing Social Bias in Argumentation
This research addresses the problem of social bias in computational argumentation, which has received little attention, for NLP applications.
This paper investigates social bias in large English debate portals by training word embedding models on portal-specific corpora and evaluating them using the WEAT metric. The study found that all tested debate corpora contain unbalanced and biased data, predominantly favoring male individuals with European-American names.
Social bias in language - towards genders, ethnicities, ages, and other social groups - poses a problem with ethical impact for many NLP applications. Recent research has shown that machine learning models trained on respective data may not only adopt, but even amplify the bias. So far, however, little attention has been paid to bias in computational argumentation. In this paper, we study the existence of social biases in large English debate portals. In particular, we train word embedding models on portal-specific corpora and systematically evaluate their bias using WEAT, an existing metric to measure bias in word embeddings. In a word co-occurrence analysis, we then investigate causes of bias. The results suggest that all tested debate corpora contain unbalanced and biased data, mostly in favor of male people with European-American names. Our empirical insights contribute towards an understanding of bias in argumentative data sources.