CLMay 12, 2022

Controlling Formality in Low-Resource NMT with Domain Adaptation and Re-Ranking: SLT-CDT-UoS at IWSLT2022

Sebastian T. Vincent, Loïc Barrault, Carolina Scarton

Meta AI

arXiv:2205.05990v131.9639 citationsh-index: 33

Originality Incremental advance

AI Analysis

This work addresses the problem of controlling formality in translation for low-resource languages, which is incremental as it builds on existing methods with specific adaptations.

The paper tackled formality control in low-resource neural machine translation by using domain adaptation and re-ranking, achieving average accuracy of 0.935 in constrained and 0.995 in unconstrained settings for English-to-German and English-to-Spanish, and 0.590 in constrained and 0.659 in unconstrained settings for zero-shot English-to-Russian and English-to-Italian.

This paper describes the SLT-CDT-UoS group's submission to the first Special Task on Formality Control for Spoken Language Translation, part of the IWSLT 2022 Evaluation Campaign. Our efforts were split between two fronts: data engineering and altering the objective function for best hypothesis selection. We used language-independent methods to extract formal and informal sentence pairs from the provided corpora; using English as a pivot language, we propagated formality annotations to languages treated as zero-shot in the task; we also further improved formality controlling with a hypothesis re-ranking approach. On the test sets for English-to-German and English-to-Spanish, we achieved an average accuracy of .935 within the constrained setting and .995 within unconstrained setting. In a zero-shot setting for English-to-Russian and English-to-Italian, we scored average accuracy of .590 for constrained setting and .659 for unconstrained.

View on arXiv PDF

Similar