CLMay 31, 2017

Learning When to Attend for Neural Machine Translation

arXiv:1705.11160v12.16 citations

Originality Highly original

AI Analysis

This addresses a specific inefficiency in neural machine translation for translation tasks, though it is incremental as it builds on existing attention models.

The paper tackled the problem that standard attention mechanisms in neural machine translation always attend to source words, even when target words have no corresponding source words, by proposing a novel attention model that decides when to attend, resulting in a 0.8 BLEU score improvement on NIST Chinese-English translation tasks.

In the past few years, attention mechanisms have become an indispensable component of end-to-end neural machine translation models. However, previous attention models always refer to some source words when predicting a target word, which contradicts with the fact that some target words have no corresponding source words. Motivated by this observation, we propose a novel attention model that has the capability of determining when a decoder should attend to source words and when it should not. Experimental results on NIST Chinese-English translation tasks show that the new model achieves an improvement of 0.8 BLEU score over a state-of-the-art baseline.

View on arXiv PDF

Similar