CLLGJun 2, 2023

Text Style Transfer Back-Translation

arXiv:2306.01318v1223 citationsh-index: 39Has Code
Originality Incremental advance
AI Analysis

This addresses a bottleneck in machine translation for natural inputs, offering a general data augmentation method that is particularly useful for domain adaptation.

The paper tackles the problem that Back Translation (BT) in machine translation mainly improves translation-like inputs but not natural inputs, by proposing Text Style Transfer Back-Translation (TST BT) to modify the source side of BT data to be more natural, resulting in significant improvements in translation performance across various language pairs.

Back Translation (BT) is widely used in the field of machine translation, as it has been proved effective for enhancing translation quality. However, BT mainly improves the translation of inputs that share a similar style (to be more specific, translation-like inputs), since the source side of BT data is machine-translated. For natural inputs, BT brings only slight improvements and sometimes even adverse effects. To address this issue, we propose Text Style Transfer Back Translation (TST BT), which uses a style transfer model to modify the source side of BT data. By making the style of source-side text more natural, we aim to improve the translation of natural inputs. Our experiments on various language pairs, including both high-resource and low-resource ones, demonstrate that TST BT significantly improves translation performance against popular BT benchmarks. In addition, TST BT is proved to be effective in domain adaptation so this strategy can be regarded as a general data augmentation method. Our training code and text style transfer model are open-sourced.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes