CLAISep 11, 2021

Modeling Concentrated Cross-Attention for Neural Machine Translation with Gaussian Mixture Model

arXiv:2109.05244v2668 citations
AI Analysis

This work addresses translation quality issues in NMT, particularly for long sentences, but is incremental as it builds on existing attention mechanisms.

The paper tackled the problem of dispersion in cross-attention for neural machine translation by modeling concentrated attention using a Gaussian Mixture Model, resulting in improved alignment quality, N-gram accuracy, and long sentence translation on three datasets.

Cross-attention is an important component of neural machine translation (NMT), which is always realized by dot-product attention in previous methods. However, dot-product attention only considers the pair-wise correlation between words, resulting in dispersion when dealing with long sentences and neglect of source neighboring relationships. Inspired by linguistics, the above issues are caused by ignoring a type of cross-attention, called concentrated attention, which focuses on several central words and then spreads around them. In this work, we apply Gaussian Mixture Model (GMM) to model the concentrated attention in cross-attention. Experiments and analyses we conducted on three datasets show that the proposed method outperforms the baseline and has significant improvement on alignment quality, N-gram accuracy, and long sentence translation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes