CLApr 5, 2020

Incorporating Bilingual Dictionaries for Low Resource Semi-Supervised Neural Machine Translation

arXiv:2004.02071v114 citations
AI Analysis

This addresses the challenge of improving translation quality in low-resource settings, which is an incremental advancement for machine learning in natural language processing.

The paper tackles the problem of low-resource neural machine translation by incorporating bilingual dictionaries to generate synthetic sentences, showing an appreciable improvement in performance over strong baselines.

We explore ways of incorporating bilingual dictionaries to enable semi-supervised neural machine translation. Conventional back-translation methods have shown success in leveraging target side monolingual data. However, since the quality of back-translation models is tied to the size of the available parallel corpora, this could adversely impact the synthetically generated sentences in a low resource setting. We propose a simple data augmentation technique to address both this shortcoming. We incorporate widely available bilingual dictionaries that yield word-by-word translations to generate synthetic sentences. This automatically expands the vocabulary of the model while maintaining high quality content. Our method shows an appreciable improvement in performance over strong baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes