Audience-specific Explanations for Machine Translation
This addresses the challenge of cultural incomprehension in machine translation for diverse audiences, though it is an incremental step focused on dataset creation.
The paper tackles the problem of identifying words or phrases in machine translation that require cultural explanations for target audiences, proposing a semi-automatic method to extract such explanations from parallel corpora. Experiments on English->German, English->French, and English->Chinese show the method increases sentences with explanations from 1.9% to over 10%.
In machine translation, a common problem is that the translation of certain words even if translated can cause incomprehension of the target language audience due to different cultural backgrounds. A solution to solve this problem is to add explanations for these words. In a first step, we therefore need to identify these words or phrases. In this work we explore techniques to extract example explanations from a parallel corpus. However, the sparsity of sentences containing words that need to be explained makes building the training dataset extremely difficult. In this work, we propose a semi-automatic technique to extract these explanations from a large parallel corpus. Experiments on English->German language pair show that our method is able to extract sentence so that more than 10% of the sentences contain explanation, while only 1.9% of the original sentences contain explanations. In addition, experiments on English->French and English->Chinese language pairs also show similar conclusions. This is therefore an essential first automatic step to create a explanation dataset. Furthermore we show that the technique is robust for all three language pairs.