CLOct 31, 2018

You May Not Need Attention

arXiv:1810.13409v130 citations
Originality Incremental advance
AI Analysis

This work addresses efficiency and latency issues in neural machine translation, offering a simpler alternative to attention-based models.

The paper tackles the problem of neural machine translation by introducing a recurrent model that eliminates attention mechanisms and separate encoder-decoder structures, achieving performance comparable to standard attention-based models and better on long sentences.

In NMT, how far can we get without attention and without separate encoding and decoding? To answer that question, we introduce a recurrent neural translation model that does not use attention and does not have a separate encoder and decoder. Our eager translation model is low-latency, writing target tokens as soon as it reads the first source token, and uses constant memory during decoding. It performs on par with the standard attention-based model of Bahdanau et al. (2014), and better on long sentences.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes