CLApr 29, 2020

Syntax-aware Data Augmentation for Neural Machine Translation

arXiv:2004.14200v122 citations
Originality Incremental advance
AI Analysis

This work addresses the need for more effective data augmentation in NMT, offering a domain-specific enhancement for translation tasks.

The paper tackles the problem of improving neural machine translation by proposing a syntax-aware data augmentation method that uses dependency parse trees to set sentence-specific word selection probabilities, resulting in significant translation performance improvements on WMT14 English-to-German and IWSLT14 German-to-English datasets.

Data augmentation is an effective performance enhancement in neural machine translation (NMT) by generating additional bilingual data. In this paper, we propose a novel data augmentation enhancement strategy for neural machine translation. Different from existing data augmentation methods which simply choose words with the same probability across different sentences for modification, we set sentence-specific probability for word selection by considering their roles in sentence. We use dependency parse tree of input sentence as an effective clue to determine selecting probability for every words in each sentence. Our proposed method is evaluated on WMT14 English-to-German dataset and IWSLT14 German-to-English dataset. The result of extensive experiments show our proposed syntax-aware data augmentation method may effectively boost existing sentence-independent methods for significant translation performance improvement.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes