Noisy Parallel Approximate Decoding for Conditional Recurrent Language Model
This work addresses decoding inefficiencies in neural machine translation, offering a parallelizable solution for faster and more accurate translation tasks, though it is incremental as it builds on existing attention-based models.
The paper tackles the problem of limited decoding strategies in conditional recurrent language models by proposing a novel, parallelizable decoding algorithm that improves upon existing methods, achieving a BLEU score improvement of 0.5 points on En->Cz translation.
Recent advances in conditional recurrent language modelling have mainly focused on network architectures (e.g., attention mechanism), learning algorithms (e.g., scheduled sampling and sequence-level training) and novel applications (e.g., image/video description generation, speech recognition, etc.) On the other hand, we notice that decoding algorithms/strategies have not been investigated as much, and it has become standard to use greedy or beam search. In this paper, we propose a novel decoding strategy motivated by an earlier observation that nonlinear hidden layers of a deep neural network stretch the data manifold. The proposed strategy is embarrassingly parallelizable without any communication overhead, while improving an existing decoding algorithm. We extensively evaluate it with attention-based neural machine translation on the task of En->Cz translation.