Can QE-informed (Re)Translation lead to Error Correction?
This addresses the issue of overcorrection in automated translation quality evaluation for researchers and practitioners in machine translation, though it is incremental as it builds on existing QE and APE methods.
The paper tackled the problem of machine translation error correction by proposing a training-free approach that selects the highest-quality translation from multiple LLM candidates, achieving a Delta COMET score of 0.0201 and winning the WMT 2025 subtask.
The paper presents two approaches submitted to the WMT 2025 Automated Translation Quality Evaluation Systems Task 3 - Quality Estimation (QE)-informed Segment-level Error Correction. While jointly training QE systems with Automatic Post-Editing (APE) has shown improved performance for both tasks, APE systems are still known to overcorrect the output of Machine Translation (MT), leading to a degradation in performance. We investigate a simple training-free approach - QE-informed Retranslation, and compare it with another within the same training-free paradigm. Our winning approach selects the highest-quality translation from multiple candidates generated by different LLMs. The second approach, more akin to APE, instructs an LLM to replace error substrings as specified in the provided QE explanation(s). A conditional heuristic was employed to minimise the number of edits, with the aim of maximising the Gain-to-Edit ratio. The two proposed approaches achieved a Delta COMET score of 0.0201 and -0.0108, respectively, leading the first approach to achieve the winning position on the subtask leaderboard.