When and Why are Pre-trained Word Embeddings Useful for Neural Machine Translation?
This addresses the problem of data scarcity in NMT for researchers and practitioners, offering insights into when embeddings help, but it is incremental as it builds on existing embedding methods.
The paper investigates the effectiveness of pre-trained word embeddings in neural machine translation, particularly in low-resource scenarios, and finds they can improve performance by up to 20 BLEU points in favorable conditions.
The performance of Neural Machine Translation (NMT) systems often suffers in low-resource scenarios where sufficiently large-scale parallel corpora cannot be obtained. Pre-trained word embeddings have proven to be invaluable for improving performance in natural language analysis tasks, which often suffer from paucity of data. However, their utility for NMT has not been extensively explored. In this work, we perform five sets of experiments that analyze when we can expect pre-trained word embeddings to help in NMT tasks. We show that such embeddings can be surprisingly effective in some cases -- providing gains of up to 20 BLEU points in the most favorable setting.