CLDec 6, 2022

Life-long Learning for Multilingual Neural Machine Translation with Knowledge Distillation

arXiv:2212.02800v19 citationsh-index: 39
Originality Incremental advance
AI Analysis

This addresses the problem of maintaining translation performance across languages for incremental learning systems, though it is incremental in applying distillation to specific multilingual scenarios.

The paper tackles catastrophic forgetting in multilingual neural machine translation when tasks arrive sequentially by proposing knowledge distillation methods for one-to-many and many-to-one scenarios, achieving significant alleviation of forgetting across twelve translation tasks.

A common scenario of Multilingual Neural Machine Translation (MNMT) is that each translation task arrives in a sequential manner, and the training data of previous tasks is unavailable. In this scenario, the current methods suffer heavily from catastrophic forgetting (CF). To alleviate the CF, we investigate knowledge distillation based life-long learning methods. Specifically, in one-tomany scenario, we propose a multilingual distillation method to make the new model (student) jointly learn multilingual output from old model (teacher) and new task. In many-to one scenario, we find that direct distillation faces the extreme partial distillation problem, and we propose two different methods to address it: pseudo input distillation and reverse teacher distillation. The experimental results on twelve translation tasks show that the proposed methods can better consolidate the previous knowledge and sharply alleviate the CF.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes