LG MLNov 28, 2017

Plan, Attend, Generate: Planning for Sequence-to-Sequence Models

Francis Dutil, Caglar Gulcehre, Adam Trischler, Yoshua Bengio

arXiv:1711.10462v17.312 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of enhancing the planning capabilities of sequence-to-sequence models for tasks like translation and question generation, representing an incremental improvement over existing attention-based methods.

The authors tackled the problem of improving sequence-to-sequence models by integrating a planning mechanism that allows the model to anticipate future alignments between input and output sequences. They demonstrated that their model outperforms strong baselines on character-level translation, Eulerian circuit finding, and question generation tasks, achieving superior performance with fewer parameters and faster convergence.

We investigate the integration of a planning mechanism into sequence-to-sequence models using attention. We develop a model which can plan ahead in the future when it computes its alignments between input and output sequences, constructing a matrix of proposed future alignments and a commitment vector that governs whether to follow or recompute the plan. This mechanism is inspired by the recently proposed strategic attentive reader and writer (STRAW) model for Reinforcement Learning. Our proposed model is end-to-end trainable using primarily differentiable operations. We show that it outperforms a strong baseline on character-level translation tasks from WMT'15, the algorithmic task of finding Eulerian circuits of graphs, and question generation from the text. Our analysis demonstrates that the model computes qualitatively intuitive alignments, converges faster than the baselines, and achieves superior performance with fewer parameters.

View on arXiv PDF Code

Similar