CLJun 2, 2025

Different Speech Translation Models Encode and Translate Speaker Gender Differently

arXiv:2506.02172v13 citationsh-index: 34ACL
Originality Incremental advance
AI Analysis

This addresses gender bias in speech translation systems, which is an incremental but important issue for improving fairness in multilingual AI applications.

The study investigated whether speech translation models encode speaker gender and its impact on gender assignment in translations, finding that traditional encoder-decoder models capture gender information, while newer adapter-based architectures do not, leading to a masculine default bias that is more pronounced in newer models.

Recent studies on interpreting the hidden states of speech models have shown their ability to capture speaker-specific features, including gender. Does this finding also hold for speech translation (ST) models? If so, what are the implications for the speaker's gender assignment in translation? We address these questions from an interpretability perspective, using probing methods to assess gender encoding across diverse ST models. Results on three language directions (English-French/Italian/Spanish) indicate that while traditional encoder-decoder models capture gender information, newer architectures -- integrating a speech encoder with a machine translation system via adapters -- do not. We also demonstrate that low gender encoding capabilities result in systems' tendency toward a masculine default, a translation bias that is more pronounced in newer architectures.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes