Analysis of Levenshtein Transformer's Decoder and Its Variants
This is an incremental analysis aimed at improving non-autoregressive machine translation models for researchers in the field.
The paper analyzed the Levenshtein Transformer's decoder to identify deficiencies in translation length, subword generation, and deletion capabilities, and compared variants like knowledge-distilled and translation memory-enhanced models to assess improvements.
Levenshtein transformer (LevT) is a non-autoregressive machine translation model with high decoding efficiency and comparable translation quality in terms of bleu score, due to its parallel decoding and iterative refinement procedure. Are there any deficiencies of its translations and what improvements could be made? In this report, we focus on LevT's decoder and analyse the decoding results length, subword generation, and deletion module's capability. We hope to identify weaknesses of the decoder for future improvements. We also compare translations of the original LevT, knowledge-distilled LevT, LevT with translation memory, and the KD-LevT with translation memory to see how KD and translation memory can help.