CLAug 30, 2024

Towards Tailored Recovery of Lexical Diversity in Literary Machine Translation

Esther Ploeger, Huiyuan Lai, Rik van Noord, Antonio Toral

arXiv:2408.17308v114.125 citationsh-index: 10

Originality Incremental advance

AI Analysis

This tackles the issue of rigid lexical diversity methods in literary machine translation, offering a tailored recovery approach for improved translation quality in this domain.

The paper addresses the problem of lexical diversity loss in machine translation of literature by proposing a reranking approach with a classifier to distinguish original from translated text, achieving lexical diversity scores close to human translation for certain books in an evaluation on 31 English-to-Dutch translations.

Machine translations are found to be lexically poorer than human translations. The loss of lexical diversity through MT poses an issue in the automatic translation of literature, where it matters not only what is written, but also how it is written. Current methods for increasing lexical diversity in MT are rigid. Yet, as we demonstrate, the degree of lexical diversity can vary considerably across different novels. Thus, rather than aiming for the rigid increase of lexical diversity, we reframe the task as recovering what is lost in the machine translation process. We propose a novel approach that consists of reranking translation candidates with a classifier that distinguishes between original and translated text. We evaluate our approach on 31 English-to-Dutch book translations, and find that, for certain books, our approach retrieves lexical diversity scores that are close to human translation.

View on arXiv PDF

Similar