Can Synthetic Translations Improve Bitext Quality?
This addresses the issue of noisy bitext data for machine translation researchers, but it is incremental as it builds on existing synthetic translation methods.
The paper tackled the problem of improving bitext quality by using synthetic translations to revise imperfect reference translations in mined bitext, resulting in improved quality confirmed through human evaluation and tasks like bilingual induction and MT.
Synthetic translations have been used for a wide range of NLP tasks primarily as a means of data augmentation. This work explores, instead, how synthetic translations can be used to revise potentially imperfect reference translations in mined bitext. We find that synthetic samples can improve bitext quality without any additional bilingual supervision when they replace the originals based on a semantic equivalence classifier that helps mitigate NMT noise. The improved quality of the revised bitext is confirmed intrinsically via human evaluation and extrinsically through bilingual induction and MT tasks.