CLMar 5, 2020

Distill, Adapt, Distill: Training Small, In-Domain Models for Neural Machine Translation

arXiv:2003.02877v31001 citations
AI Analysis

This work addresses the need for efficient, domain-specific translation models, but it is incremental as it builds on existing techniques of distillation and adaptation.

The paper tackled the problem of training small, memory-efficient machine translation models by investigating the interaction between domain adaptation and knowledge distillation, finding that distilling twice—first on general-domain data and then on in-domain data with an adapted teacher—yields best performance, as shown in large-scale experiments across three language pairs and three domains each.

We explore best practices for training small, memory efficient machine translation models with sequence-level knowledge distillation in the domain adaptation setting. While both domain adaptation and knowledge distillation are widely-used, their interaction remains little understood. Our large-scale empirical results in machine translation (on three language pairs with three domains each) suggest distilling twice for best performance: once using general-domain data and again using in-domain data with an adapted teacher.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes