Towards Neural Machine Translation with Latent Tree Attention
This work addresses the problem of incorporating hierarchical language structure into neural machine translation without requiring annotated data, which is incremental as it builds on existing attentional methods.
The authors tackled unsupervised learning of hierarchical language structure for machine translation by introducing a model with a recurrent neural network grammar encoder and a novel attentional RNNG decoder, using policy gradient reinforcement learning to induce tree structures without annotation; the model achieved performance close to an attentional baseline on character-level datasets.
Building models that take advantage of the hierarchical structure of language without a priori annotation is a longstanding goal in natural language processing. We introduce such a model for the task of machine translation, pairing a recurrent neural network grammar encoder with a novel attentional RNNG decoder and applying policy gradient reinforcement learning to induce unsupervised tree structures on both the source and target. When trained on character-level datasets with no explicit segmentation or parse annotation, the model learns a plausible segmentation and shallow parse, obtaining performance close to an attentional baseline.