Extending Multilingual Machine Translation through Imitation Learning
This work addresses the challenge of incorporating low-resource languages into existing translation systems, which is an incremental advancement for multilingual NLP applications.
The paper tackles the problem of extending multilingual machine translation models to include a new language using only parallel data with English, by proposing Imit-MNMT, which uses imitation learning to generate pseudo-parallel corpora and mitigate catastrophic forgetting, resulting in significant improvements in translation performance.
Despite the growing variety of languages supported by existing multilingual neural machine translation (MNMT) models, most of the world's languages are still being left behind. We aim to extend large-scale MNMT models to incorporate a new language, enabling translations between this new language and all previously supported languages, even in the challenging scenario where only a parallel corpus between the new language and English is available. Previous methods, such as continued training on parallel data including the new language, often suffer from catastrophic forgetting, which degrades performance on other languages. We propose a novel approach Imit-MNMT which treats this task as an imitation learning problem, a technique widely used in computer vision but less explored in natural language processing. Specifically, we leverage an expert model to generate pseudo-parallel corpora between the new language and the existing languages. We then introduce a data distribution imitation strategy using language-specific weighting, alongside a translation behavior imitation mechanism. Extensive experiments show that our approach significantly improves translation performance between the new and existing languages while mitigating catastrophic forgetting.