CLMay 19, 2023

Pseudo-Label Training and Model Inertia in Neural Machine Translation

Benjamin Hsu, Anna Currey, Xing Niu, Maria Nădejde, Georgiana Dinu

arXiv:2305.11808v10.5

Originality Incremental advance

AI Analysis

This addresses stability issues in NMT models for translation applications, but it is incremental as it builds on known PLT techniques.

The paper investigates pseudo-label training (PLT) in neural machine translation, showing that it improves model stability against input perturbations and updates, termed model inertia, with results indicating enhanced robustness.

Like many other machine learning applications, neural machine translation (NMT) benefits from over-parameterized deep neural models. However, these models have been observed to be brittle: NMT model predictions are sensitive to small input changes and can show significant variation across re-training or incremental model updates. This work studies a frequently used method in NMT, pseudo-label training (PLT), which is common to the related techniques of forward-translation (or self-training) and sequence-level knowledge distillation. While the effect of PLT on quality is well-documented, we highlight a lesser-known effect: PLT can enhance a model's stability to model updates and input perturbations, a set of properties we call model inertia. We study inertia effects under different training settings and we identify distribution simplification as a mechanism behind the observed results.

View on arXiv PDF

Similar