CLMay 8, 2023

Target-Side Augmentation for Document-Level Machine Translation

arXiv:2305.04505v2225 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses data scarcity for document-level machine translation, offering a novel approach to improve translation quality, though it is incremental as it builds on existing augmentation techniques.

The paper tackled data sparsity in document-level machine translation by proposing a target-side augmentation method that generates multiple translations to smooth the learning distribution, resulting in a 2.30 s-BLEU improvement over the previous best system on News and achieving new state-of-the-art on benchmarks.

Document-level machine translation faces the challenge of data sparsity due to its long input length and a small amount of training data, increasing the risk of learning spurious patterns. To address this challenge, we propose a target-side augmentation method, introducing a data augmentation (DA) model to generate many potential translations for each source document. Learning on these wider range translations, an MT model can learn a smoothed distribution, thereby reducing the risk of data sparsity. We demonstrate that the DA model, which estimates the posterior distribution, largely improves the MT performance, outperforming the previous best system by 2.30 s-BLEU on News and achieving new state-of-the-art on News and Europarl benchmarks. Our code is available at https://github.com/baoguangsheng/target-side-augmentation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes