CLSep 7, 2022

On the Complementarity between Pre-Training and Random-Initialization for Resource-Rich Machine Translation

Changtong Zan, Liang Ding, Li Shen, Yu Cao, Weifeng Liu, Dacheng Tao

arXiv:2209.03316v331.2591 citationsh-index: 155Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of effectively leveraging pre-training in resource-rich machine translation, which is important for researchers and practitioners in NLP, though it is incremental as it builds on existing methods.

The paper tackled the problem that pre-training often fails to improve resource-rich neural machine translation compared to random initialization, finding that pre-training enhances generalization and lexical diversity rather than accuracy. By combining these complementary aspects through a model fusion algorithm using optimal transport, they achieved substantial improvements in translation accuracy, generalization, and negative diversity on benchmarks like WMT'17 English-Chinese and WMT'19 English-German.

Pre-Training (PT) of text representations has been successfully applied to low-resource Neural Machine Translation (NMT). However, it usually fails to achieve notable gains (sometimes, even worse) on resource-rich NMT on par with its Random-Initialization (RI) counterpart. We take the first step to investigate the complementarity between PT and RI in resource-rich scenarios via two probing analyses, and find that: 1) PT improves NOT the accuracy, but the generalization by achieving flatter loss landscapes than that of RI; 2) PT improves NOT the confidence of lexical choice, but the negative diversity by assigning smoother lexical probability distributions than that of RI. Based on these insights, we propose to combine their complementarities with a model fusion algorithm that utilizes optimal transport to align neurons between PT and RI. Experiments on two resource-rich translation benchmarks, WMT'17 English-Chinese (20M) and WMT'19 English-German (36M), show that PT and RI could be nicely complementary to each other, achieving substantial improvements considering both translation accuracy, generalization, and negative diversity. Probing tools and code are released at: https://github.com/zanchangtong/PTvsRI.

View on arXiv PDF Code

Similar