CLAIAug 22, 2018

Learning When to Concentrate or Divert Attention: Self-Adaptive Attention Temperature for Neural Machine Translation

arXiv:1808.07374v21099 citations
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in neural machine translation for improving translation quality, but it is incremental as it builds on existing attention mechanisms.

The paper tackled the problem of conventional attention mechanisms in neural machine translation treating all decoding steps equally, which is suboptimal for different word types, by proposing a Self-Adaptive Control of Temperature (SACT) mechanism to adjust attention softness. The result showed that the model outperformed baseline models on Chinese-English and English-Vietnamese translation tasks, as demonstrated by experimental results and case studies.

Most of the Neural Machine Translation (NMT) models are based on the sequence-to-sequence (Seq2Seq) model with an encoder-decoder framework equipped with the attention mechanism. However, the conventional attention mechanism treats the decoding at each time step equally with the same matrix, which is problematic since the softness of the attention for different types of words (e.g. content words and function words) should differ. Therefore, we propose a new model with a mechanism called Self-Adaptive Control of Temperature (SACT) to control the softness of attention by means of an attention temperature. Experimental results on the Chinese-English translation and English-Vietnamese translation demonstrate that our model outperforms the baseline models, and the analysis and the case study show that our model can attend to the most relevant elements in the source-side contexts and generate the translation of high quality.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes