CLAISep 3, 2025

Mitigation of Gender and Ethnicity Bias in AI-Generated Stories through Model Explanations

arXiv:2509.04515v11 citationsh-index: 24
Originality Synthesis-oriented
AI Analysis

This addresses bias in generative AI for occupational stories, which is an incremental improvement using existing methods on new data.

The paper tackled gender and ethnicity bias in AI-generated occupational stories by proposing BAME, a method that uses model explanations for targeted prompt engineering, resulting in 2% to 20% improvements in demographic representation across three large language models.

Language models have been shown to propagate social bias through their output, particularly in the representation of gender and ethnicity. This paper investigates gender and ethnicity biases in AI-generated occupational stories. Representation biases are measured before and after applying our proposed mitigation strategy, Bias Analysis and Mitigation through Explanation (BAME), revealing improvements in demographic representation ranging from 2% to 20%. BAME leverages model-generated explanations to inform targeted prompt engineering, effectively reducing biases without modifying model parameters. By analyzing stories generated across 25 occupational groups, three large language models (Claude 3.5 Sonnet, Llama 3.1 70B Instruct, and GPT-4 Turbo), and multiple demographic dimensions, we identify persistent patterns of overrepresentation and underrepresentation linked to training data stereotypes. Our findings demonstrate that guiding models with their own internal reasoning mechanisms can significantly enhance demographic parity, thereby contributing to the development of more transparent generative AI systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes