LG AI CVDec 14, 2020

Multi-Domain Multi-Task Rehearsal for Lifelong Learning

Fan Lyu, Shuai Wang, Wei Feng, Zihan Ye, Fuyuan Hu, Song Wang

arXiv:2012.07236v112.834 citations

Originality Incremental advance

AI Analysis

This paper provides an incremental improvement for researchers and practitioners working on lifelong learning, specifically addressing the problem of unpredictable domain shift in rehearsal-based methods.

This paper addresses catastrophic forgetting in lifelong learning by proposing Multi-Domain Multi-Task (MDMT) rehearsal. It tackles the unpredictable domain shift between old and new tasks, which arises from data imbalance and task isolation, by training tasks in parallel and equally. The method introduces a two-level angular margin loss for class/task compactness and discrepancy, and an optional episodic distillation loss to anchor knowledge for old tasks, effectively mitigating domain shift on benchmark datasets.

Rehearsal, seeking to remind the model by storing old knowledge in lifelong learning, is one of the most effective ways to mitigate catastrophic forgetting, i.e., biased forgetting of previous knowledge when moving to new tasks. However, the old tasks of the most previous rehearsal-based methods suffer from the unpredictable domain shift when training the new task. This is because these methods always ignore two significant factors. First, the Data Imbalance between the new task and old tasks that makes the domain of old tasks prone to shift. Second, the Task Isolation among all tasks will make the domain shift toward unpredictable directions; To address the unpredictable domain shift, in this paper, we propose Multi-Domain Multi-Task (MDMT) rehearsal to train the old tasks and new task parallelly and equally to break the isolation among tasks. Specifically, a two-level angular margin loss is proposed to encourage the intra-class/task compactness and inter-class/task discrepancy, which keeps the model from domain chaos. In addition, to further address domain shift of the old tasks, we propose an optional episodic distillation loss on the memory to anchor the knowledge for each old task. Experiments on benchmark datasets validate the proposed approach can effectively mitigate the unpredictable domain shift.

View on arXiv PDF

Similar