What Have We Achieved on Non-autoregressive Translation?
This work addresses the uncertainty in machine translation research by providing a comprehensive comparison for practitioners, though it is incremental in evaluating existing methods.
The paper systematically evaluated four non-autoregressive translation methods and found that, despite narrowing the performance gap, they still underperform autoregressive translation under more reliable metrics like human evaluation, with explicit dependency modeling being crucial for natural language generation.
Recent advances have made non-autoregressive (NAT) translation comparable to autoregressive methods (AT). However, their evaluation using BLEU has been shown to weakly correlate with human annotations. Limited research compares non-autoregressive translation and autoregressive translation comprehensively, leaving uncertainty about the true proximity of NAT to AT. To address this gap, we systematically evaluate four representative NAT methods across various dimensions, including human evaluation. Our empirical results demonstrate that despite narrowing the performance gap, state-of-the-art NAT still underperforms AT under more reliable evaluation metrics. Furthermore, we discover that explicitly modeling dependencies is crucial for generating natural language and generalizing to out-of-distribution sequences.