CLMay 4, 2021

Data Augmentation by Concatenation for Low-Resource Translation: A Mystery and a Solution

arXiv:2105.01691v2715 citations
Originality Incremental advance
AI Analysis

This addresses the problem of understanding data augmentation mechanisms for low-resource translation, though it is incremental as it clarifies an existing method.

The paper investigated why concatenation improves low-resource neural machine translation, finding that gains of about +1 BLEU across four language pairs are driven by context diversity, length diversity, and position shifting, not discourse context.

In this paper, we investigate the driving factors behind concatenation, a simple but effective data augmentation method for low-resource neural machine translation. Our experiments suggest that discourse context is unlikely the cause for the improvement of about +1 BLEU across four language pairs. Instead, we demonstrate that the improvement comes from three other factors unrelated to discourse: context diversity, length diversity, and (to a lesser extent) position shifting.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes