CLLGMLMay 12, 2016

Noisy Parallel Approximate Decoding for Conditional Recurrent Language Model

arXiv:1605.03835v166 citations
Originality Incremental advance
AI Analysis

This work addresses decoding inefficiencies in neural machine translation, offering a parallelizable solution for faster and more accurate translation tasks, though it is incremental as it builds on existing attention-based models.

The paper tackles the problem of limited decoding strategies in conditional recurrent language models by proposing a novel, parallelizable decoding algorithm that improves upon existing methods, achieving a BLEU score improvement of 0.5 points on En->Cz translation.

Recent advances in conditional recurrent language modelling have mainly focused on network architectures (e.g., attention mechanism), learning algorithms (e.g., scheduled sampling and sequence-level training) and novel applications (e.g., image/video description generation, speech recognition, etc.) On the other hand, we notice that decoding algorithms/strategies have not been investigated as much, and it has become standard to use greedy or beam search. In this paper, we propose a novel decoding strategy motivated by an earlier observation that nonlinear hidden layers of a deep neural network stretch the data manifold. The proposed strategy is embarrassingly parallelizable without any communication overhead, while improving an existing decoding algorithm. We extensively evaluate it with attention-based neural machine translation on the task of En->Cz translation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes