CLNov 3, 2019

Controlling Text Complexity in Neural Machine Translation

arXiv:1911.00835v11014 citations
Originality Incremental advance
AI Analysis

This addresses the need for accessible translations for non-native speakers, though it is incremental as it builds on existing MT and text simplification methods.

The paper tackles the problem of generating machine translation outputs tailored to different target language proficiency levels by introducing a multi-task sequence-to-sequence model that translates Spanish to English at easier reading grade levels, showing it outperforms independent translation and simplification pipelines.

This work introduces a machine translation task where the output is aimed at audiences of different levels of target language proficiency. We collect a high quality dataset of news articles available in English and Spanish, written for diverse grade levels and propose a method to align segments across comparable bilingual articles. The resulting dataset makes it possible to train multi-task sequence-to-sequence models that translate Spanish into English targeted at an easier reading grade level than the original Spanish. We show that these multi-task models outperform pipeline approaches that translate and simplify text independently.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes