CLAug 18, 2020

Very Deep Transformers for Neural Machine Translation

arXiv:2008.07772v2110 citationsHas Code
AI Analysis

This work addresses the problem of improving translation accuracy for machine translation systems, though it is incremental as it builds on existing Transformer architectures.

The paper tackled the challenge of training very deep Transformer models for Neural Machine Translation by introducing a simple initialization technique to stabilize training, resulting in models with up to 60 encoder layers that outperform baseline 6-layer models by up to 2.5 BLEU and achieve new state-of-the-art results, such as 43.8 BLEU on WMT14 English-French.

We explore the application of very deep Transformer models for Neural Machine Translation (NMT). Using a simple yet effective initialization technique that stabilizes training, we show that it is feasible to build standard Transformer-based models with up to 60 encoder layers and 12 decoder layers. These deep models outperform their baseline 6-layer counterparts by as much as 2.5 BLEU, and achieve new state-of-the-art benchmark results on WMT14 English-French (43.8 BLEU and 46.4 BLEU with back-translation) and WMT14 English-German (30.1 BLEU).The code and trained models will be publicly available at: https://github.com/namisan/exdeep-nmt.

Code Implementations4 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes