CLMay 31, 2021

Adapting High-resource NMT Models to Translate Low-resource Related Languages without Parallel Data

Wei-Jen Ko, Ahmed El-Kishky, Adithya Renduchintala, Vishrav Chaudhary, Naman Goyal, Francisco Guzmán, Pascale Fung, Philipp Koehn, Mona Diab

arXiv:2105.15071v231.7717 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the scarcity of parallel data for low-resource languages, which is a major obstacle in machine translation, though it is incremental in leveraging existing techniques like denoising autoencoding and back-translation.

The paper tackles the problem of translating low-resource languages without parallel data by adapting high-resource NMT models using linguistic similarity, showing significant improvements in translation quality for 7 languages from three families.

The scarcity of parallel data is a major obstacle for training high-quality machine translation systems for low-resource languages. Fortunately, some low-resource languages are linguistically related or similar to high-resource languages; these related languages may share many lexical or syntactic structures. In this work, we exploit this linguistic overlap to facilitate translating to and from a low-resource language with only monolingual data, in addition to any parallel data in the related high-resource language. Our method, NMT-Adapt, combines denoising autoencoding, back-translation and adversarial objectives to utilize monolingual data for low-resource adaptation. We experiment on 7 languages from three different language families and show that our technique significantly improves translation into low-resource language compared to other translation baselines.

View on arXiv PDF Code

Similar