CLJul 1, 2021

Knowledge Distillation for Quality Estimation

arXiv:2107.00411v1711 citations
Originality Incremental advance
AI Analysis

This work addresses the need for efficient, real-time quality estimation in applications like online social media translation, though it is incremental in improving model efficiency.

The paper tackles the problem of reducing the size and computational cost of Quality Estimation models for Machine Translation, achieving competitive performance with 8x fewer parameters compared to distilled pre-trained representations.

Quality Estimation (QE) is the task of automatically predicting Machine Translation quality in the absence of reference translations, making it applicable in real-time settings, such as translating online social media conversations. Recent success in QE stems from the use of multilingual pre-trained representations, where very large models lead to impressive results. However, the inference time, disk and memory requirements of such models do not allow for wide usage in the real world. Models trained on distilled pre-trained representations remain prohibitively large for many usage scenarios. We instead propose to directly transfer knowledge from a strong QE teacher model to a much smaller model with a different, shallower architecture. We show that this approach, in combination with data augmentation, leads to light-weight QE models that perform competitively with distilled pre-trained representations with 8x fewer parameters.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes