CLSep 26, 2019

Monotonic Multihead Attention

arXiv:1909.12406v1149 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of balancing translation quality and latency in simultaneous machine translation, which is incremental as it builds on existing monotonic attention mechanisms.

The paper tackled the problem of simultaneous machine translation by proposing Monotonic Multihead Attention (MMA), which extends monotonic attention to multihead attention and introduces novel latency control methods, resulting in better latency-quality tradeoffs compared to the previous state-of-the-art MILk approach.

Simultaneous machine translation models start generating a target sequence before they have encoded or read the source sequence. Recent approaches for this task either apply a fixed policy on a state-of-the art Transformer model, or a learnable monotonic attention on a weaker recurrent neural network-based structure. In this paper, we propose a new attention mechanism, Monotonic Multihead Attention (MMA), which extends the monotonic attention mechanism to multihead attention. We also introduce two novel and interpretable approaches for latency control that are specifically designed for multiple attentions heads. We apply MMA to the simultaneous machine translation task and demonstrate better latency-quality tradeoffs compared to MILk, the previous state-of-the-art approach. We also analyze how the latency controls affect the attention span and we motivate the introduction of our model by analyzing the effect of the number of decoder layers and heads on quality and latency.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes