CLJun 11, 2019

Translating Translationese: A Two-Step Approach to Unsupervised Machine Translation

Nima Pourdamghani, Nada Aldarrab, Marjan Ghazvininejad, Kevin Knight, Jonathan May

arXiv:1906.05683v131.11101 citations

Originality Incremental advance

AI Analysis

This addresses the problem of building machine translation systems without parallel data for both high- and low-resource languages, though it is incremental as it builds on existing unsupervised techniques.

The paper tackles unsupervised machine translation by proposing a two-step approach that first generates a rough gloss using a dictionary and then decodes it into fluent translations, achieving better or comparable results on high-resource languages and good quality for low-resource ones compared to prior studies.

Given a rough, word-by-word gloss of a source language sentence, target language natives can uncover the latent, fully-fluent rendering of the translation. In this work we explore this intuition by breaking translation into a two step process: generating a rough gloss by means of a dictionary and then `translating' the resulting pseudo-translation, or `Translationese' into a fully fluent translation. We build our Translationese decoder once from a mish-mash of parallel data that has the target language in common and then can build dictionaries on demand using unsupervised techniques, resulting in rapidly generated unsupervised neural MT systems for many source languages. We apply this process to 14 test languages, obtaining better or comparable translation results on high-resource languages than previously published unsupervised MT studies, and obtaining good quality results for low-resource languages that have never been used in an unsupervised MT scenario.

View on arXiv PDF

Similar