CLSep 13, 2022

Data-adaptive Transfer Learning for Translation: A Case Study in Haitian and Jamaican

Nathaniel R. Robinson, Cameron J. Hogan, Nancy Fulda, David R. Mortensen

CMU

arXiv:2209.06295v131.0582 citationsh-index: 18

Originality Incremental advance

AI Analysis

This work addresses low-resource machine translation for languages like Haitian and Jamaican, offering incremental improvements through data-adaptive methods.

The study tackled the problem of low-resource machine translation by analyzing data-adaptive transfer learning, showing that back-translation can be counterproductive beyond a data threshold, while cross-lingual transfer from related languages is better, with orthographic transformation improving results by 6.63 BLEU points in Jamaican translation.

Multilingual transfer techniques often improve low-resource machine translation (MT). Many of these techniques are applied without considering data characteristics. We show in the context of Haitian-to-English translation that transfer effectiveness is correlated with amount of training data and relationships between knowledge-sharing languages. Our experiments suggest that for some languages beyond a threshold of authentic data, back-translation augmentation methods are counterproductive, while cross-lingual transfer from a sufficiently related language is preferred. We complement this finding by contributing a rule-based French-Haitian orthographic and syntactic engine and a novel method for phonological embedding. When used with multilingual techniques, orthographic transformation makes statistically significant improvements over conventional methods. And in very low-resource Jamaican MT, code-switching with a transfer language for orthographic resemblance yields a 6.63 BLEU point advantage.

View on arXiv PDF

Similar