CLFeb 11, 2021

Towards Personalised and Document-level Machine Translation of Dialogue

arXiv:2102.10979v132.7803 citations

Originality Synthesis-oriented

AI Analysis

It addresses the need for more accurate and context-aware translation in dialogue, which is incremental as it builds on emerging fields with limited prior work.

The paper tackles the problem of machine translation lacking context, such as speaker gender and previous sentences, by focusing on personalized and document-level NMT for dialogue from TV subtitles in five languages, aiming to incorporate extra-textual information, improve cohesion device translation, and develop reliable evaluation metrics.

State-of-the-art (SOTA) neural machine translation (NMT) systems translate texts at sentence level, ignoring context: intra-textual information, like the previous sentence, and extra-textual information, like the gender of the speaker. Because of that, some sentences are translated incorrectly. Personalised NMT (PersNMT) and document-level NMT (DocNMT) incorporate this information into the translation process. Both fields are relatively new and previous work within them is limited. Moreover, there are no readily available robust evaluation metrics for them, which makes it difficult to develop better systems, as well as track global progress and compare different methods. This thesis proposal focuses on PersNMT and DocNMT for the domain of dialogue extracted from TV subtitles in five languages: English, Brazilian Portuguese, German, French and Polish. Three main challenges are addressed: (1) incorporating extra-textual information directly into NMT systems; (2) improving the machine translation of cohesion devices; (3) reliable evaluation for PersNMT and DocNMT.

View on arXiv PDF

Similar