CLDec 5, 2014

On Using Very Large Target Vocabulary for Neural Machine Translation

arXiv:1412.2007v21035 citations
AI Analysis

This addresses a key bottleneck in neural machine translation for researchers and practitioners by enabling efficient use of large vocabularies, though it is incremental as it builds on existing neural translation frameworks.

The paper tackles the problem of handling large target vocabularies in neural machine translation, which increases training and decoding complexity, by proposing a method based on importance sampling to use very large vocabularies without added training cost and efficient decoding via subset selection. The result shows that models with this approach outperform baseline small-vocabulary models and achieve state-of-the-art BLEU scores on English->German translation and near state-of-the-art on English->French.

Neural machine translation, a recently proposed approach to machine translation based purely on neural networks, has shown promising results compared to the existing approaches such as phrase-based statistical machine translation. Despite its recent success, neural machine translation has its limitation in handling a larger vocabulary, as training complexity as well as decoding complexity increase proportionally to the number of target words. In this paper, we propose a method that allows us to use a very large target vocabulary without increasing training complexity, based on importance sampling. We show that decoding can be efficiently done even with the model having a very large target vocabulary by selecting only a small subset of the whole target vocabulary. The models trained by the proposed approach are empirically found to outperform the baseline models with a small vocabulary as well as the LSTM-based neural machine translation models. Furthermore, when we use the ensemble of a few models with very large target vocabularies, we achieve the state-of-the-art translation performance (measured by BLEU) on the English->German translation and almost as high performance as state-of-the-art English->French translation system.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes