Character-based NMT with Transformer
This work addresses the robustness issue in NMT for noisy or out-of-domain text, though it is incremental as it builds on known techniques.
The paper tackled the performance gap of character-based neural machine translation (NMT) compared to BPE-based models by applying the Transformer architecture, showing that character-based models are more robust to noisy text and domain shifts, with comparable BLEU scores achieved on clean, in-domain data using deeper models.
Character-based translation has several appealing advantages, but its performance is in general worse than a carefully tuned BPE baseline. In this paper we study the impact of character-based input and output with the Transformer architecture. In particular, our experiments on EN-DE show that character-based Transformer models are more robust than their BPE counterpart, both when translating noisy text, and when translating text from a different domain. To obtain comparable BLEU scores in clean, in-domain data and close the gap with BPE-based models we use known techniques to train deeper Transformer models.