CLSep 21, 2021

Stepmothers are mean and academics are pretentious: What do pretrained language models learn about you?

arXiv:2109.10052v1663 citations
Originality Incremental advance
AI Analysis

This addresses the problem of bias in AI for researchers and practitioners, though it is incremental by building on existing work in model analysis.

The paper investigates stereotypical information captured by pretrained language models, presenting the first dataset of social group stereotypes and an unsupervised method to elicit them, with experiments showing how attitudes and emotions shift across models and during fine-tuning.

In this paper, we investigate what types of stereotypical information are captured by pretrained language models. We present the first dataset comprising stereotypical attributes of a range of social groups and propose a method to elicit stereotypes encoded by pretrained language models in an unsupervised fashion. Moreover, we link the emergent stereotypes to their manifestation as basic emotions as a means to study their emotional effects in a more generalized manner. To demonstrate how our methods can be used to analyze emotion and stereotype shifts due to linguistic experience, we use fine-tuning on news sources as a case study. Our experiments expose how attitudes towards different social groups vary across models and how quickly emotions and stereotypes can shift at the fine-tuning stage.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes