CLJul 24, 2017

Deep Architectures for Neural Machine Translation

Antonio Valerio Miceli Barone, Jindřich Helcl, Rico Sennrich, Barry Haddow, Alexandra Birch

arXiv:1707.07631v139.71142 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the need for efficient and high-quality translation models, but it is incremental as it builds on existing deep learning approaches.

The paper tackled the problem of comparing and improving deep architectures for neural machine translation, finding that their novel BiDeep RNN architecture achieved an average improvement of 1.5 BLEU over a strong baseline on the English to German WMT dataset.

It has been shown that increasing model depth improves the quality of neural machine translation. However, different architectural variants to increase model depth have been proposed, and so far, there has been no thorough comparative study. In this work, we describe and evaluate several existing approaches to introduce depth in neural machine translation. Additionally, we explore novel architectural variants, including deep transition RNNs, and we vary how attention is used in the deep decoder. We introduce a novel "BiDeep" RNN architecture that combines deep transition RNNs and stacked RNNs. Our evaluation is carried out on the English to German WMT news translation dataset, using a single-GPU machine for both training and inference. We find that several of our proposed architectures improve upon existing approaches in terms of speed and translation quality. We obtain best improvements with a BiDeep RNN of combined depth 8, obtaining an average improvement of 1.5 BLEU over a strong shallow baseline. We release our code for ease of adoption.

View on arXiv PDF Code

Similar