CLAILGSep 17, 2021

Back-translation for Large-Scale Multilingual Machine Translation

arXiv:2109.08712v1651 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of scalable multilingual translation for researchers and practitioners, but it is incremental as it builds on existing back-translation techniques.

The paper tackled building a single multilingual machine translation system by extending back-translation methods from bilingual to multilingual settings, achieving second place in WMT-21 tasks with findings like constrained sampling performing better and smaller vocabularies improving results.

This paper illustrates our approach to the shared task on large-scale multilingual machine translation in the sixth conference on machine translation (WMT-21). This work aims to build a single multilingual translation system with a hypothesis that a universal cross-language representation leads to better multilingual translation performance. We extend the exploration of different back-translation methods from bilingual translation to multilingual translation. Better performance is obtained by the constrained sampling method, which is different from the finding of the bilingual translation. Besides, we also explore the effect of vocabularies and the amount of synthetic data. Surprisingly, the smaller size of vocabularies perform better, and the extensive monolingual English data offers a modest improvement. We submitted to both the small tasks and achieved the second place.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes