CLAIOct 24, 2023

Integrating Language Models into Direct Speech Translation: An Inference-Time Solution to Control Gender Inflection

arXiv:2310.15752v1132 citationsh-index: 34
Originality Highly original
AI Analysis

This addresses the issue of gender bias in speech translation systems for users needing accurate gender inflection, offering a practical inference-time solution.

The paper tackles the problem of controlling speaker-related gender inflections in speech translation without requiring model retraining, achieving up to 31.0 and 1.6 point improvements in gender accuracy for feminine forms compared to base models and training-time strategies, respectively.

When translating words referring to the speaker, speech translation (ST) systems should not resort to default masculine generics nor rely on potentially misleading vocal traits. Rather, they should assign gender according to the speakers' preference. The existing solutions to do so, though effective, are hardly feasible in practice as they involve dedicated model re-training on gender-labeled ST data. To overcome these limitations, we propose the first inference-time solution to control speaker-related gender inflections in ST. Our approach partially replaces the (biased) internal language model (LM) implicitly learned by the ST decoder with gender-specific external LMs. Experiments on en->es/fr/it show that our solution outperforms the base models and the best training-time mitigation strategy by up to 31.0 and 1.6 points in gender accuracy, respectively, for feminine forms. The gains are even larger (up to 32.0 and 3.4) in the challenging condition where speakers' vocal traits conflict with their gender.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes