CLLGASFeb 8, 2025

Gender Bias in Instruction-Guided Speech Synthesis Models

arXiv:2502.05649v116 citationsh-index: 12NAACL
Originality Synthesis-oriented
AI Analysis

This addresses a fairness issue in AI for users of speech synthesis systems, but it is incremental as it builds on existing bias research in controllable TTS models.

The study investigated gender bias in instruction-guided speech synthesis models when interpreting occupation-related prompts like 'Act like a nurse', finding that models exhibit tendencies to amplify gender stereotypes and that bias varies with model size across occupations.

Recent advancements in controllable expressive speech synthesis, especially in text-to-speech (TTS) models, have allowed for the generation of speech with specific styles guided by textual descriptions, known as style prompts. While this development enhances the flexibility and naturalness of synthesized speech, there remains a significant gap in understanding how these models handle vague or abstract style prompts. This study investigates the potential gender bias in how models interpret occupation-related prompts, specifically examining their responses to instructions like "Act like a nurse". We explore whether these models exhibit tendencies to amplify gender stereotypes when interpreting such prompts. Our experimental results reveal the model's tendency to exhibit gender bias for certain occupations. Moreover, models of different sizes show varying degrees of this bias across these occupations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes