Controlling Neural Machine Translation Formality with Synthetic Supervision
This work addresses the need for audience-appropriate translations in machine translation, though it is incremental as it builds on existing multi-task models with a novel training approach.
The paper tackled the problem of controlling formality in neural machine translation by introducing a training scheme that generates synthetic triplets to address the lack of labeled bilingual data, resulting in a model that outperforms existing ones in matching desired formality levels while preserving meaning.
This work aims to produce translations that convey source language content at a formality level that is appropriate for a particular audience. Framing this problem as a neural sequence-to-sequence task ideally requires training triplets consisting of a bilingual sentence pair labeled with target language formality. However, in practice, available training examples are limited to English sentence pairs of different styles, and bilingual parallel sentences of unknown formality. We introduce a novel training scheme for multi-task models that automatically generates synthetic training triplets by inferring the missing element on the fly, thus enabling end-to-end training. Comprehensive automatic and human assessments show that our best model outperforms existing models by producing translations that better match desired formality levels while preserving the source meaning.