Revisiting the Markov Property for Machine Translation
This work addresses a fundamental modeling issue in machine translation for researchers and practitioners, but it is incremental as it revisits an existing concept with new experiments.
The paper tackles the problem of applying the Markov property to neural machine translation by designing a Markov Autoregressive Transformer (MAT) and finds that with an order larger than 4, it achieves translation quality comparable to conventional autoregressive transformers on four WMT benchmarks, but surprisingly, higher-order MAT does not improve translation of longer sentences.
In this paper, we re-examine the Markov property in the context of neural machine translation. We design a Markov Autoregressive Transformer~(MAT) and undertake a comprehensive assessment of its performance across four WMT benchmarks. Our findings indicate that MAT with an order larger than 4 can generate translations with quality on par with that of conventional autoregressive transformers. In addition, counter-intuitively, we also find that the advantages of utilizing a higher-order MAT do not specifically contribute to the translation of longer sentences.