CLOct 16, 2023

Will the Prince Get True Love's Kiss? On the Model Sensitivity to Gender Perturbation over Fairytale Texts

arXiv:2310.10865v311 citationsh-index: 5
Originality Incremental advance
AI Analysis

This addresses the problem of gender bias in AI models for fairytale comprehension, offering an incremental improvement in robustness through counterfactual training.

The study investigated whether language models are affected by learned gender stereotypes in story comprehension by testing their sensitivity to gender perturbations in fairytale texts, finding that models showed slight performance drops but became more robust when fine-tuned on counterfactual data.

In this paper, we study whether language models are affected by learned gender stereotypes during the comprehension of stories. Specifically, we investigate how models respond to gender stereotype perturbations through counterfactual data augmentation. Focusing on Question Answering (QA) tasks in fairytales, we modify the FairytaleQA dataset by swapping gendered character information and introducing counterfactual gender stereotypes during training. This allows us to assess model robustness and examine whether learned biases influence story comprehension. Our results show that models exhibit slight performance drops when faced with gender perturbations in the test set, indicating sensitivity to learned stereotypes. However, when fine-tuned on counterfactual training data, models become more robust to anti-stereotypical narratives. Additionally, we conduct a case study demonstrating how incorporating counterfactual anti-stereotype examples can improve inclusivity in downstream applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes