CLSep 20, 2021

CUNI systems for WMT21: Multilingual Low-Resource Translation for Indo-European Languages Shared Task

arXiv:2109.09354v1649 citations
AI Analysis

This work addresses translation challenges for low-resource Indo-European languages, presenting incremental improvements through shared models and multi-task learning.

The paper tackled multilingual low-resource translation for Indo-European languages by developing a shared multilingual model for Catalan to Romanian, Italian, and Occitan, showing that joint modeling improves translation quality across pairs and that character-level models work well for similar languages like Catalan-Occitan but less for distant ones.

This paper describes Charles University submission for Multilingual Low-Resource Translation for Indo-European Languages shared task at WMT21. We competed in translation from Catalan into Romanian, Italian and Occitan. Our systems are based on shared multilingual model. We show that using joint model for multiple similar language pairs improves upon translation quality in each pair. We also demonstrate that chararacter-level bilingual models are competitive for very similar language pairs (Catalan-Occitan) but less so for more distant pairs. We also describe our experiments with multi-task learning, where aside from a textual translation, the models are also trained to perform grapheme-to-phoneme conversion.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes