CLMay 22, 2020

Character-level Transformer-based Neural Machine Translation

arXiv:2005.11239v122 citations
AI Analysis

This work addresses efficiency and performance challenges in character-level NMT for machine translation researchers, though it is incremental as it builds on existing Transformer and character-level approaches.

The paper tackles the high computational cost of character-level neural machine translation by proposing a novel Transformer-based architecture that trains on a single GPU and is 34% faster than the character-level Transformer while achieving comparable or better translation quality on WMT'15 language pairs, such as outperforming subword-level models in FI-EN.

Neural machine translation (NMT) is nowadays commonly applied at the subword level, using byte-pair encoding. A promising alternative approach focuses on character-level translation, which simplifies processing pipelines in NMT considerably. This approach, however, must consider relatively longer sequences, rendering the training process prohibitively expensive. In this paper, we discuss a novel, Transformer-based approach, that we compare, both in speed and in quality to the Transformer at subword and character levels, as well as previously developed character-level models. We evaluate our models on 4 language pairs from WMT'15: DE-EN, CS-EN, FI-EN and RU-EN. The proposed novel architecture can be trained on a single GPU and is 34% percent faster than the character-level Transformer; still, the obtained results are at least on par with it. In addition, our proposed model outperforms the subword-level model in FI-EN and shows close results in CS-EN. To stimulate further research in this area and close the gap with subword-level NMT, we make all our code and models publicly available.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes