CLMay 27, 2021

How Does Distilled Data Complexity Impact the Quality and Confidence of Non-Autoregressive Machine Translation?

arXiv:2105.12900v1714 citations
Originality Incremental advance
AI Analysis

This work addresses a key bottleneck for non-autoregressive translation models, which rely on distillation, by analyzing data complexity impacts, though it is incremental in nature.

The paper investigates why knowledge distillation improves non-autoregressive machine translation, finding that reduced lexical diversity and reordering complexity in distilled data enhance translation quality, with lexical diversity primarily boosting model confidence and affecting calibration differently across models.

While non-autoregressive (NAR) models are showing great promise for machine translation, their use is limited by their dependence on knowledge distillation from autoregressive models. To address this issue, we seek to understand why distillation is so effective. Prior work suggests that distilled training data is less complex than manual translations. Based on experiments with the Levenshtein Transformer and the Mask-Predict NAR models on the WMT14 German-English task, this paper shows that different types of complexity have different impacts: while reducing lexical diversity and decreasing reordering complexity both help NAR learn better alignment between source and target, and thus improve translation quality, lexical diversity is the main reason why distillation increases model confidence, which affects the calibration of different NAR models differently.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes