CLMay 13, 2025

Are We Paying Attention to Her? Investigating Gender Disambiguation and Attention in Machine Translation

arXiv:2505.08546v15 citationsh-index: 18Has Code
Originality Incremental advance
AI Analysis

This addresses gender bias in machine translation, which is an incremental improvement in evaluation methods for a specific domain.

The authors tackled gender bias in Neural Machine Translation by proposing Minimal Pair Accuracy (MPA) to measure reliance on gender cues, showing that models often ignore feminine cues and rely on stereotypes, with masculine cues eliciting more diffused attention responses.

While gender bias in modern Neural Machine Translation (NMT) systems has received much attention, traditional evaluation metrics do not to fully capture the extent to which these systems integrate contextual gender cues. We propose a novel evaluation metric called Minimal Pair Accuracy (MPA), which measures the reliance of models on gender cues for gender disambiguation. MPA is designed to go beyond surface-level gender accuracy metrics by focusing on whether models adapt to gender cues in minimal pairs -- sentence pairs that differ solely in the gendered pronoun, namely the explicit indicator of the target's entity gender in the source language (EN). We evaluate a number of NMT models on the English-Italian (EN--IT) language pair using this metric, we show that they ignore available gender cues in most cases in favor of (statistical) stereotypical gender interpretation. We further show that in anti-stereotypical cases, these models tend to more consistently take masculine gender cues into account while ignoring the feminine cues. Furthermore, we analyze the attention head weights in the encoder component and show that while all models encode gender information to some extent, masculine cues elicit a more diffused response compared to the more concentrated and specialized responses to feminine gender cues.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes