CLAILGDec 15, 2024

Analyzing the Attention Heads for Pronoun Disambiguation in Context-aware Machine Translation Models

arXiv:2412.11187v119 citationsh-index: 20COLING
Originality Synthesis-oriented
AI Analysis

This work addresses pronoun disambiguation in machine translation, which is an incremental improvement for translation quality.

The paper investigates the role of attention heads in context-aware machine translation models for pronoun disambiguation in English-to-German and English-to-French, finding that fine-tuning specific heads can increase accuracy by up to 5 percentage points.

In this paper, we investigate the role of attention heads in Context-aware Machine Translation models for pronoun disambiguation in the English-to-German and English-to-French language directions. We analyze their influence by both observing and modifying the attention scores corresponding to the plausible relations that could impact a pronoun prediction. Our findings reveal that while some heads do attend the relations of interest, not all of them influence the models' ability to disambiguate pronouns. We show that certain heads are underutilized by the models, suggesting that model performance could be improved if only the heads would attend one of the relations more strongly. Furthermore, we fine-tune the most promising heads and observe the increase in pronoun disambiguation accuracy of up to 5 percentage points which demonstrates that the improvements in performance can be solidified into the models' parameters.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes