Beyond Noise: Mitigating the Impact of Fine-grained Semantic Divergences on Neural Machine Translation
This addresses a specific issue in NMT for translation tasks, offering an incremental improvement over prior noise-handling methods.
The paper tackled the problem of fine-grained semantic divergences in parallel training data degrading neural machine translation (NMT) performance, showing that models trained on such data output degenerated text more often and are less confident, and introduced a divergent-aware framework that improved translation quality and calibration on EN-FR tasks.
While it has been shown that Neural Machine Translation (NMT) is highly sensitive to noisy parallel training samples, prior work treats all types of mismatches between source and target as noise. As a result, it remains unclear how samples that are mostly equivalent but contain a small number of semantically divergent tokens impact NMT training. To close this gap, we analyze the impact of different types of fine-grained semantic divergences on Transformer models. We show that models trained on synthetic divergences output degenerated text more frequently and are less confident in their predictions. Based on these findings, we introduce a divergent-aware NMT framework that uses factors to help NMT recover from the degradation caused by naturally occurring divergences, improving both translation quality and model calibration on EN-FR tasks.