CLSep 21, 2020

Alleviating the Inequality of Attention Heads for Neural Machine Translation

arXiv:2009.09672v2582 citations
AI Analysis

This work addresses a specific issue in neural machine translation models, but it appears incremental as it builds on known bottlenecks in attention mechanisms.

The paper tackled the problem of unequal attention heads in Transformer models for neural machine translation by proposing a simple masking method called HeadMask, which achieved translation improvements on multiple language pairs.

Recent studies show that the attention heads in Transformer are not equal. We relate this phenomenon to the imbalance training of multi-head attention and the model dependence on specific heads. To tackle this problem, we propose a simple masking method: HeadMask, in two specific ways. Experiments show that translation improvements are achieved on multiple language pairs. Subsequent empirical analyses also support our assumption and confirm the effectiveness of the method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes