CLMLAug 16, 2016

An Efficient Character-Level Neural Machine Translation

arXiv:1608.04738v23 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses a key efficiency problem in machine translation for researchers and practitioners, though it is incremental as it builds on existing encoder-decoder frameworks.

The paper tackles the bottleneck of large vocabulary in neural machine translation by proposing an efficient character-level architecture with a decimator and interpolator, resulting in faster training, better memory efficiency, and the ability to handle misspelled words.

Neural machine translation aims at building a single large neural network that can be trained to maximize translation performance. The encoder-decoder architecture with an attention mechanism achieves a translation performance comparable to the existing state-of-the-art phrase-based systems on the task of English-to-French translation. However, the use of large vocabulary becomes the bottleneck in both training and improving the performance. In this paper, we propose an efficient architecture to train a deep character-level neural machine translation by introducing a decimator and an interpolator. The decimator is used to sample the source sequence before encoding while the interpolator is used to resample after decoding. Such a deep model has two major advantages. It avoids the large vocabulary issue radically; at the same time, it is much faster and more memory-efficient in training than conventional character-based models. More interestingly, our model is able to translate the misspelled word like human beings.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes