Hidden Markov Transformer for Simultaneous Machine Translation
This addresses the core problem of timing in simultaneous machine translation for real-time applications, representing a novel method rather than an incremental improvement.
The paper tackles the challenge of determining optimal moments to start translating each target token in simultaneous machine translation by proposing a Hidden Markov Transformer (HMT), which models these moments as hidden events and selects one to generate tokens, achieving state-of-the-art performance on multiple benchmarks.
Simultaneous machine translation (SiMT) outputs the target sequence while receiving the source sequence, and hence learning when to start translating each target token is the core challenge for SiMT task. However, it is non-trivial to learn the optimal moment among many possible moments of starting translating, as the moments of starting translating always hide inside the model and can only be supervised with the observed target sequence. In this paper, we propose a Hidden Markov Transformer (HMT), which treats the moments of starting translating as hidden events and the target sequence as the corresponding observed events, thereby organizing them as a hidden Markov model. HMT explicitly models multiple moments of starting translating as the candidate hidden events, and then selects one to generate the target token. During training, by maximizing the marginal likelihood of the target sequence over multiple moments of starting translating, HMT learns to start translating at the moments that target tokens can be generated more accurately. Experiments on multiple SiMT benchmarks show that HMT outperforms strong baselines and achieves state-of-the-art performance.