CLJan 18, 2019

Modeling Latent Sentence Structure in Neural Machine Translation

Jasmijn Bastings, Wilker Aziz, Ivan Titov, Khalil Sima'an

arXiv:1901.06436v20.74 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of enhancing translation quality by integrating linguistic structure as a latent variable, which is incremental as it builds on prior supervised parsing methods.

The authors tackled the problem of incorporating latent sentence structure into neural machine translation to improve performance, finding that CNN and word-embedding-based encoders effectively use induced graphs to capture long-distance dependencies, while RNN encoders do not.

Recently it was shown that linguistic structure predicted by a supervised parser can be beneficial for neural machine translation (NMT). In this work we investigate a more challenging setup: we incorporate sentence structure as a latent variable in a standard NMT encoder-decoder and induce it in such a way as to benefit the translation task. We consider German-English and Japanese-English translation benchmarks and observe that when using RNN encoders the model makes no or very limited use of the structure induction apparatus. In contrast, CNN and word-embedding-based encoders rely on latent graphs and force them to encode useful, potentially long-distance, dependencies.

View on arXiv PDF

Similar