How To Build Competitive Multi-gender Speech Translation Models For Controlling Speaker Gender Translation
This addresses gender inclusivity in speech translation for languages with grammatical gender, though it is incremental as it builds on prior work on gender-specific models.
The paper tackles the problem of gender bias in speech translation from notional to grammatical gender languages by integrating speaker gender metadata into a single multi-gender model, achieving up to 12.9% accuracy gains for feminine forms compared to gender-specialized models.
When translating from notional gender languages (e.g., English) into grammatical gender languages (e.g., Italian), the generated translation requires explicit gender assignments for various words, including those referring to the speaker. When the source sentence does not convey the speaker's gender, speech translation (ST) models either rely on the possibly-misleading vocal traits of the speaker or default to the masculine gender, the most frequent in existing training corpora. To avoid such biased and not inclusive behaviors, the gender assignment of speaker-related expressions should be guided by externally-provided metadata about the speaker's gender. While previous work has shown that the most effective solution is represented by separate, dedicated gender-specific models, the goal of this paper is to achieve the same results by integrating the speaker's gender metadata into a single "multi-gender" neural ST model, easier to maintain. Our experiments demonstrate that a single multi-gender model outperforms gender-specialized ones when trained from scratch (with gender accuracy gains up to 12.9 for feminine forms), while fine-tuning from existing ST models does not lead to competitive results.