CLSep 21, 2025

Extending Automatic Machine Translation Evaluation to Book-Length Documents

arXiv:2509.17249v13 citationsh-index: 17EMNLP
Originality Incremental advance
AI Analysis

This addresses a critical bottleneck in machine translation evaluation for researchers and practitioners by enabling document-level assessment, though it is incremental as it builds on existing metrics with segmentation and alignment methods.

The paper tackles the problem of evaluating machine translation for long documents by introducing SEGALE, an evaluation scheme that extends existing metrics to handle book-length texts, showing it significantly outperforms existing long-form evaluation methods and reveals that many open-weight LLMs fail at translating documents at their maximum context lengths.

Despite Large Language Models (LLMs) demonstrating superior translation performance and long-context capabilities, evaluation methodologies remain constrained to sentence-level assessment due to dataset limitations, token number restrictions in metrics, and rigid sentence boundary requirements. We introduce SEGALE, an evaluation scheme that extends existing automatic metrics to long-document translation by treating documents as continuous text and applying sentence segmentation and alignment methods. Our approach enables previously unattainable document-level evaluation, handling translations of arbitrary length generated with document-level prompts while accounting for under-/over-translations and varied sentence boundaries. Experiments show our scheme significantly outperforms existing long-form document evaluation schemes, while being comparable to evaluations performed with groundtruth sentence alignments. Additionally, we apply our scheme to book-length texts and newly demonstrate that many open-weight LLMs fail to effectively translate documents at their reported maximum context lengths.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes