CLJul 9, 2019

NTT's Machine Translation Systems for WMT19 Robustness Task

arXiv:1907.03927v11091 citations
Originality Synthesis-oriented
AI Analysis

This work addresses translation robustness for noisy social media data, but it is incremental as it builds on existing techniques for a specific competition task.

The paper tackled machine translation of noisy text like social media posts for the WMT19 robustness task, combining synthetic corpus use, domain adaptation, and a placeholder mechanism that improved over the baseline by enhancing translation accuracy with non-standard tokens.

This paper describes NTT's submission to the WMT19 robustness task. This task mainly focuses on translating noisy text (e.g., posts on Twitter), which presents different difficulties from typical translation tasks such as news. Our submission combined techniques including utilization of a synthetic corpus, domain adaptation, and a placeholder mechanism, which significantly improved over the previous baseline. Experimental results revealed the placeholder mechanism, which temporarily replaces the non-standard tokens including emojis and emoticons with special placeholder tokens during translation, improves translation accuracy even with noisy texts.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes