CLOct 23, 2024

Together We Can: Multilingual Automatic Post-Editing for Low-Resource Languages

arXiv:2410.17973v123 citationsh-index: 18Has CodeEMNLP
Originality Incremental advance
AI Analysis

It addresses translation quality issues for low-resource languages like Marathi and Hindi, though it is incremental as it builds on existing APE and multilingual methods.

This study tackled the problem of improving machine translation quality for low-resource Indo-Aryan languages by developing a multilingual Automatic Post-Editing (APE) system, resulting in performance gains of up to 2.5 TER points over single-pair models and additional improvements through multi-task learning, data augmentation, and domain adaptation.

This exploratory study investigates the potential of multilingual Automatic Post-Editing (APE) systems to enhance the quality of machine translations for low-resource Indo-Aryan languages. Focusing on two closely related language pairs, English-Marathi and English-Hindi, we exploit the linguistic similarities to develop a robust multilingual APE model. To facilitate cross-linguistic transfer, we generate synthetic Hindi-Marathi and Marathi-Hindi APE triplets. Additionally, we incorporate a Quality Estimation (QE)-APE multi-task learning framework. While the experimental results underline the complementary nature of APE and QE, we also observe that QE-APE multitask learning facilitates effective domain adaptation. Our experiments demonstrate that the multilingual APE models outperform their corresponding English-Hindi and English-Marathi single-pair models by $2.5$ and $2.39$ TER points, respectively, with further notable improvements over the multilingual APE model observed through multi-task learning ($+1.29$ and $+1.44$ TER points), data augmentation ($+0.53$ and $+0.45$ TER points) and domain adaptation ($+0.35$ and $+0.45$ TER points). We release the synthetic data, code, and models accrued during this study publicly at https://github.com/cfiltnlp/Multilingual-APE.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes