CLAISep 19, 2020

Long-Short Term Masking Transformer: A Simple but Effective Baseline for Document-level Neural Machine Translation

arXiv:2009.09127v11002 citations
Originality Incremental advance
AI Analysis

This work addresses document-level translation for NLP practitioners, offering a simple baseline improvement that is incremental in nature.

The paper tackled the problem of error accumulation in document-level neural machine translation by proposing a long-short term masking self-attention mechanism on the standard transformer, achieving strong BLEU scores and improved discourse capture on two public datasets.

Many document-level neural machine translation (NMT) systems have explored the utility of context-aware architecture, usually requiring an increasing number of parameters and computational complexity. However, few attention is paid to the baseline model. In this paper, we research extensively the pros and cons of the standard transformer in document-level translation, and find that the auto-regressive property can simultaneously bring both the advantage of the consistency and the disadvantage of error accumulation. Therefore, we propose a surprisingly simple long-short term masking self-attention on top of the standard transformer to both effectively capture the long-range dependence and reduce the propagation of errors. We examine our approach on the two publicly available document-level datasets. We can achieve a strong result in BLEU and capture discourse phenomena.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes