CLJul 12, 2019

The University of Edinburgh's Submissions to the WMT19 News Translation Task

Rachel Bawden, Nikolay Bogoychev, Ulrich Germann, Roman Grundkiewicz, Faheem Kirefu, Antonio Valerio Miceli Barone, Alexandra Birch

arXiv:1907.05854v131.11096 citations

Originality Synthesis-oriented

AI Analysis

This work addresses machine translation for specific language pairs, but it is incremental, building on existing methods like back-translation.

The University of Edinburgh tackled machine translation for six language pairs in the WMT19 News Translation Task, using back-translation and other techniques, with results including gains in translation quality such as a few additional insights for German-to-English over prior work.

The University of Edinburgh participated in the WMT19 Shared Task on News Translation in six language directions: English-to-Gujarati, Gujarati-to-English, English-to-Chinese, Chinese-to-English, German-to-English, and English-to-Czech. For all translation directions, we created or used back-translations of monolingual data in the target language as additional synthetic training data. For English-Gujarati, we also explored semi-supervised MT with cross-lingual language model pre-training, and translation pivoting through Hindi. For translation to and from Chinese, we investigated character-based tokenisation vs. sub-word segmentation of Chinese text. For German-to-English, we studied the impact of vast amounts of back-translated training data on translation quality, gaining a few additional insights over Edunov et al. (2018). For English-to-Czech, we compared different pre-processing and tokenisation regimes.

View on arXiv PDF

Similar