Redefining Machine Simultaneous Interpretation: From Incremental Translation to Human-Like Strategies
This work addresses the challenge of real-time translation for applications requiring low latency and high quality, representing an incremental improvement by enhancing existing SiMT methods with new actions.
The paper tackled the problem of Simultaneous Machine Translation (SiMT) by extending the action space with adaptive actions like SENTENCE_CUT and DROP, implemented in a decoder-only LLM framework, resulting in improved semantic metrics and lower delay on benchmarks such as ACL60/60 English-Chinese and English-German.
Simultaneous Machine Translation (SiMT) requires high-quality translations under strict real-time constraints, which traditional encoder-decoder policies with only READ/WRITE actions cannot fully address. We extend the action space of SiMT with four adaptive actions: SENTENCE_CUT, DROP, PARTIAL_SUMMARIZATION and PRONOMINALIZATION, which enable real-time restructuring, omission, and simplification while preserving semantic fidelity. We implement these actions in a decoder-only large language model (LLM) framework and construct training references through action-aware prompting. To evaluate both quality and latency, we further develop a latency-aware TTS pipeline that maps textual outputs to speech with realistic timing. Experiments on the ACL60/60 English-Chinese and English-German benchmarks show that our framework consistently improves semantic metrics (e.g., COMET-KIWI) and achieves lower delay (measured by Average Lagging) compared to reference translations and salami-based baselines. Notably, combining DROP and SENTENCE_CUT yields the best overall balance between fluency and latency. These results demonstrate that enriching the action space of LLM-based SiMT provides a promising direction for bridging the gap between human and machine interpretation.