Improving Machine Translation with Phrase Pair Injection and Corpus Filtering
This work addresses translation quality for low-resource language pairs, but it is incremental as it builds on existing augmentation and filtering techniques.
The paper tackles the problem of improving neural machine translation for low-resource languages by combining phrase pair injection and corpus filtering, resulting in BLEU score gains of up to 2.7 points on FLORES test data for three language pairs.
In this paper, we show that the combination of Phrase Pair Injection and Corpus Filtering boosts the performance of Neural Machine Translation (NMT) systems. We extract parallel phrases and sentences from the pseudo-parallel corpus and augment it with the parallel corpus to train the NMT models. With the proposed approach, we observe an improvement in the Machine Translation (MT) system for 3 low-resource language pairs, Hindi-Marathi, English-Marathi, and English-Pashto, and 6 translation directions by up to 2.7 BLEU points, on the FLORES test data. These BLEU score improvements are over the models trained using the whole pseudo-parallel corpus augmented with the parallel corpus.