CL LGJul 12, 2024

Towards Chapter-to-Chapter Context-Aware Literary Translation via Large Language Models

arXiv:2407.08978v11.91 citationsh-index: 6

Originality Incremental advance

AI Analysis

This addresses the challenge of context-aware literary translation for researchers and practitioners, though it is incremental as it builds on existing LLM methods.

The paper tackled the problem of sparse discourse phenomena and unrealistic sentence-level alignments in document-level translation by curating a dataset of 160 Chinese-English books and proposing a chapter-to-chapter (Ch2Ch) translation setting, where finetuning large language models yielded impressive improvements over baselines.

Discourse phenomena in existing document-level translation datasets are sparse, which has been a fundamental obstacle in the development of context-aware machine translation models. Moreover, most existing document-level corpora and context-aware machine translation methods rely on an unrealistic assumption on sentence-level alignments. To mitigate these issues, we first curate a novel dataset of Chinese-English literature, which consists of 160 books with intricate discourse structures. Then, we propose a more pragmatic and challenging setting for context-aware translation, termed chapter-to-chapter (Ch2Ch) translation, and investigate the performance of commonly-used machine translation models under this setting. Furthermore, we introduce a potential approach of finetuning large language models (LLMs) within the domain of Ch2Ch literary translation, yielding impressive improvements over baselines. Through our comprehensive analysis, we unveil that literary translation under the Ch2Ch setting is challenging in nature, with respect to both model learning methods and translation decoding algorithms.

View on arXiv PDF

Similar