CLOct 25, 2023

Enhanced Simultaneous Machine Translation with Word-level Policies

arXiv:2310.16417v121.1132 citationsh-index: 1Has Code

Originality Incremental advance

AI Analysis

This work addresses a practical bottleneck in SiMT for real-time translation applications by shifting from subword to word-level operations, though it is incremental as it builds on existing policy frameworks.

The paper tackles the problem of Simultaneous Machine Translation (SiMT) by showing that word-level policies outperform subword-level ones, processing multiple subwords to form complete words in a single step, and it introduces a method to enhance SiMT models using language models with the word-level policy addressing subword disparities.

Recent years have seen remarkable advances in the field of Simultaneous Machine Translation (SiMT) due to the introduction of innovative policies that dictate whether to READ or WRITE at each step of the translation process. However, a common assumption in many existing studies is that operations are carried out at the subword level, even though the standard unit for input and output in most practical scenarios is typically at the word level. This paper demonstrates that policies devised and validated at the subword level are surpassed by those operating at the word level, which process multiple subwords to form a complete word in a single step. Additionally, we suggest a method to boost SiMT models using language models (LMs), wherein the proposed word-level policy plays a vital role in addressing the subword disparity between LMs and SiMT models. Code is available at https://github.com/xl8-ai/WordSiMT.

View on arXiv PDF Code

Similar