CLAILGMay 27, 2025

SeqPO-SiMT: Sequential Policy Optimization for Simultaneous Machine Translation

arXiv:2505.20622v113 citationsh-index: 5ACL
Originality Highly original
AI Analysis

This addresses the problem of real-time translation with minimal delay for users needing immediate cross-lingual communication, representing a strong specific gain rather than a broad breakthrough.

The paper tackles simultaneous machine translation by proposing SeqPO-SiMT, a policy optimization framework that treats it as a sequential decision problem, achieving higher translation quality with lower latency. On the NEWSTEST2021 En to Zh dataset, it outperforms supervised fine-tuning by 1.13 COMET points and reduces Average Lagging by 6.17.

We present Sequential Policy Optimization for Simultaneous Machine Translation (SeqPO-SiMT), a new policy optimization framework that defines the simultaneous machine translation (SiMT) task as a sequential decision making problem, incorporating a tailored reward to enhance translation quality while reducing latency. In contrast to popular Reinforcement Learning from Human Feedback (RLHF) methods, such as PPO and DPO, which are typically applied in single-step tasks, SeqPO-SiMT effectively tackles the multi-step SiMT task. This intuitive framework allows the SiMT LLMs to simulate and refine the SiMT process using a tailored reward. We conduct experiments on six datasets from diverse domains for En to Zh and Zh to En SiMT tasks, demonstrating that SeqPO-SiMT consistently achieves significantly higher translation quality with lower latency. In particular, SeqPO-SiMT outperforms the supervised fine-tuning (SFT) model by 1.13 points in COMET, while reducing the Average Lagging by 6.17 in the NEWSTEST2021 En to Zh dataset. While SiMT operates with far less context than offline translation, the SiMT results of SeqPO-SiMT on 7B LLM surprisingly rival the offline translation of high-performing LLMs, including Qwen-2.5-7B-Instruct and LLaMA-3-8B-Instruct.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes