CLOct 18, 2022

Synergy with Translation Artifacts for Training and Inference in Multilingual Tasks

arXiv:2210.09588v1291 citationsh-index: 17Has Code
Originality Incremental advance
AI Analysis

This work addresses performance enhancement in multilingual NLP tasks, offering an incremental improvement by leveraging translation artifacts more effectively.

The paper tackles the problem of improving multilingual sentence classification by simultaneously using translations in both directions during training and inference, finding that translation artifacts are key to performance gains and proposing a cross-lingual fine-tuning algorithm called MUSC that achieves improved results.

Translation has played a crucial role in improving the performance on multilingual tasks: (1) to generate the target language data from the source language data for training and (2) to generate the source language data from the target language data for inference. However, prior works have not considered the use of both translations simultaneously. This paper shows that combining them can synergize the results on various multilingual sentence classification tasks. We empirically find that translation artifacts stylized by translators are the main factor of the performance gain. Based on this analysis, we adopt two training methods, SupCon and MixUp, considering translation artifacts. Furthermore, we propose a cross-lingual fine-tuning algorithm called MUSC, which uses SupCon and MixUp jointly and improves the performance. Our code is available at https://github.com/jongwooko/MUSC.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes