CL NEJun 13, 2017

Plan, Attend, Generate: Character-level Neural Machine Translation with Planning in the Decoder

Caglar Gulcehre, Francis Dutil, Adam Trischler, Yoshua Bengio

arXiv:1706.05087v22.99 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses machine translation accuracy for language processing tasks, but it is incremental as it builds on existing encoder-decoder and STRAW models.

The paper tackled character-level neural machine translation by integrating a planning mechanism into an encoder-decoder architecture, resulting in outperforming a strong baseline on the WMT'15 corpus with fewer parameters.

We investigate the integration of a planning mechanism into an encoder-decoder architecture with an explicit alignment for character-level machine translation. We develop a model that plans ahead when it computes alignments between the source and target sequences, constructing a matrix of proposed future alignments and a commitment vector that governs whether to follow or recompute the plan. This mechanism is inspired by the strategic attentive reader and writer (STRAW) model. Our proposed model is end-to-end trainable with fully differentiable operations. We show that it outperforms a strong baseline on three character-level decoder neural machine translation on WMT'15 corpus. Our analysis demonstrates that our model can compute qualitatively intuitive alignments and achieves superior performance with fewer parameters.

View on arXiv PDF Code

Similar