CLOct 31, 2018

You May Not Need Attention

arXiv:1810.13409v130 citations

Originality Incremental advance

AI Analysis

This work addresses efficiency and latency issues in neural machine translation, offering a simpler alternative to attention-based models.

The paper tackles the problem of neural machine translation by introducing a recurrent model that eliminates attention mechanisms and separate encoder-decoder structures, achieving performance comparable to standard attention-based models and better on long sentences.

In NMT, how far can we get without attention and without separate encoding and decoding? To answer that question, we introduce a recurrent neural translation model that does not use attention and does not have a separate encoder and decoder. Our eager translation model is low-latency, writing target tokens as soon as it reads the first source token, and uses constant memory during decoding. It performs on par with the standard attention-based model of Bahdanau et al. (2014), and better on long sentences.

View on arXiv PDF

Similar