Sparse and Constrained Attention for Neural Machine Translation
This addresses translation errors for NMT users, but it is incremental as it modifies only the attention mechanism.
The paper tackled the coverage problem in neural machine translation, where words are dropped or repeated, by introducing constrained sparsemax, a novel attention transformation that allocates fertilities to source words to bound attention, achieving improvements in translation quality across three language pairs.
In NMT, words are sometimes dropped from the source or generated repeatedly in the translation. We explore novel strategies to address the coverage problem that change only the attention transformation. Our approach allocates fertilities to source words, used to bound the attention each word can receive. We experiment with various sparse and constrained attention transformations and propose a new one, constrained sparsemax, shown to be differentiable and sparse. Empirical evaluation is provided in three languages pairs.