CLJul 29, 2019

CUNI Systems for the Unsupervised News Translation Task in WMT 2019

arXiv:1907.12664v11092 citations
Originality Synthesis-oriented
AI Analysis

This work addresses unsupervised machine translation for news domains, presenting an incremental improvement by focusing on named entity handling.

The paper tackled unsupervised news translation by building a phrase-based system initialized from cross-lingual embeddings and refining it with iterative back-translation, achieving a BLEU score of 15.3 on the German-Czech WMT19 task.

In this paper we describe the CUNI translation system used for the unsupervised news shared task of the ACL 2019 Fourth Conference on Machine Translation (WMT19). We follow the strategy of Artexte et al. (2018b), creating a seed phrase-based system where the phrase table is initialized from cross-lingual embedding mappings trained on monolingual data, followed by a neural machine translation system trained on synthetic parallel data. The synthetic corpus was produced from a monolingual corpus by a tuned PBMT model refined through iterative back-translation. We further focus on the handling of named entities, i.e. the part of vocabulary where the cross-lingual embedding mapping suffers most. Our system reaches a BLEU score of 15.3 on the German-Czech WMT19 shared task.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes