CLFeb 19, 2021

Back Translation Survey for Improving Text Augmentation

Matthew Ciolino, David Noever, Josh Kalin

arXiv:2102.09708v20.2

Originality Synthesis-oriented

AI Analysis

This work addresses the need for scalable text augmentation methods to support large transformer models in NLP, but it is incremental as it surveys existing techniques.

The paper investigated the impact of using 108 different language back translations on text augmentation for NLP, analyzing effects on various metrics and text embeddings.

Natural Language Processing (NLP) relies heavily on training data. Transformers, as they have gotten bigger, have required massive amounts of training data. To satisfy this requirement, text augmentation should be looked at as a way to expand your current dataset and to generalize your models. One text augmentation we will look at is translation augmentation. We take an English sentence and translate it to another language before translating it back to English. In this paper, we look at the effect of 108 different language back translations on various metrics and text embeddings.

View on arXiv PDF

Similar