CLSep 21, 2020

Alleviating the Inequality of Attention Heads for Neural Machine Translation

Zewei Sun, Shujian Huang, Xin-Yu Dai, Jiajun Chen

arXiv:2009.09672v225.6582 citations

Originality Incremental advance

AI Analysis

This work addresses a specific issue in neural machine translation models, but it appears incremental as it builds on known bottlenecks in attention mechanisms.

The paper tackled the problem of unequal attention heads in Transformer models for neural machine translation by proposing a simple masking method called HeadMask, which achieved translation improvements on multiple language pairs.

Recent studies show that the attention heads in Transformer are not equal. We relate this phenomenon to the imbalance training of multi-head attention and the model dependence on specific heads. To tackle this problem, we propose a simple masking method: HeadMask, in two specific ways. Experiments show that translation improvements are achieved on multiple language pairs. Subsequent empirical analyses also support our assumption and confirm the effectiveness of the method.

View on arXiv PDF

Similar