Semantic Prosody in Machine Translation: the English-Chinese Case of Passive Structures
This addresses a linguistic accuracy issue in machine translation for English-Chinese and other language pairs, though it is incremental as it fine-tunes existing models on a specific structure.
The paper tackled the problem of machine translation models failing to handle semantic prosody, specifically the negative connotations of Chinese BEI passives, by fine-tuning models like OPUS-MT and NLLB-600M on a custom dataset, resulting in improved accuracy in using BEI passives for unfavorable content and avoiding them for neutral or favorable content, with transferability shown across language pairs.
Semantic prosody is a collocational meaning formed through the co-occurrence of a linguistic unit and a consistent series of collocates, which should be treated separately from semantic meaning. Since words that are literal translations of each other may have different semantic prosody, more attention should be paid to this linguistic property to generate accurate translations. However, current machine translation models cannot handle this problem. To bridge the gap, we propose an approach to teach machine translation models about semantic prosody of a specific structure. We focus on Chinese BEI passives and create a dataset of English-Chinese sentence pairs with the purpose of demonstrating the negative semantic prosody of BEI passives. Then we fine-tune OPUS-MT, NLLB-600M and mBART50 models with our dataset for the English-Chinese translation task. Our results show that fine-tuned MT models perform better on using BEI passives for translating unfavourable content and avoid using it for neutral and favourable content. Also, in NLLB-600M, which is a multilingual model, this knowledge of semantic prosody can be transferred from English-Chinese translation to other language pairs, such as Spanish-Chinese.