CLJul 12, 2019

The University of Edinburgh's Submissions to the WMT19 News Translation Task

arXiv:1907.05854v11096 citations
Originality Synthesis-oriented
AI Analysis

This work addresses machine translation for specific language pairs, but it is incremental, building on existing methods like back-translation.

The University of Edinburgh tackled machine translation for six language pairs in the WMT19 News Translation Task, using back-translation and other techniques, with results including gains in translation quality such as a few additional insights for German-to-English over prior work.

The University of Edinburgh participated in the WMT19 Shared Task on News Translation in six language directions: English-to-Gujarati, Gujarati-to-English, English-to-Chinese, Chinese-to-English, German-to-English, and English-to-Czech. For all translation directions, we created or used back-translations of monolingual data in the target language as additional synthetic training data. For English-Gujarati, we also explored semi-supervised MT with cross-lingual language model pre-training, and translation pivoting through Hindi. For translation to and from Chinese, we investigated character-based tokenisation vs. sub-word segmentation of Chinese text. For German-to-English, we studied the impact of vast amounts of back-translated training data on translation quality, gaining a few additional insights over Edunov et al. (2018). For English-to-Czech, we compared different pre-processing and tokenisation regimes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes