VBD-MT Chinese-Vietnamese Translation Systems for VLSP 2022
This work addresses machine translation for Chinese-Vietnamese language pairs, presenting incremental improvements over baselines.
The paper tackled Chinese-Vietnamese and Vietnamese-Chinese machine translation by building systems based on Transformer and mBART, enhanced with backtranslation, ensembling, and postprocessing, achieving 38.9 and 38.0 BLEU scores on test sets.
We present our systems participated in the VLSP 2022 machine translation shared task. In the shared task this year, we participated in both translation tasks, i.e., Chinese-Vietnamese and Vietnamese-Chinese translations. We build our systems based on the neural-based Transformer model with the powerful multilingual denoising pre-trained model mBART. The systems are enhanced by a sampling method for backtranslation, which leverage large scale available monolingual data. Additionally, several other methods are applied to improve the translation quality including ensembling and postprocessing. We achieve 38.9 BLEU on ChineseVietnamese and 38.0 BLEU on VietnameseChinese on the public test sets, which outperform several strong baselines.