Incremental Meta-Learning via Episodic Replay Distillation for Few-Shot Image Recognition
This addresses the problem of catastrophic forgetting in incremental meta-learning for researchers and practitioners in few-shot learning, representing an incremental improvement over existing methods.
The paper tackles the problem of incremental meta-learning for few-shot image recognition, where data arrives in tasks with disjoint classes, and proposes Episodic Replay Distillation (ERD) to mix current and past class exemplars during meta-learning to reduce catastrophic forgetting. The result shows that ERD surpasses state-of-the-art methods, reducing the gap to joint-training upper bounds from 3.5%/10.1%/13.4% to 2.6%/2.9%/5.0% on Tiered-ImageNet, Mini-ImageNet, and CIFAR100, respectively.
Most meta-learning approaches assume the existence of a very large set of labeled data available for episodic meta-learning of base knowledge. This contrasts with the more realistic continual learning paradigm in which data arrives incrementally in the form of tasks containing disjoint classes. In this paper we consider this problem of Incremental Meta-Learning (IML) in which classes are presented incrementally in discrete tasks. We propose an approach to IML, which we call Episodic Replay Distillation (ERD), that mixes classes from the current task with class exemplars from previous tasks when sampling episodes for meta-learning. These episodes are then used for knowledge distillation to minimize catastrophic forgetting. Experiments on four datasets demonstrate that ERD surpasses the state-of-the-art. In particular, on the more challenging one-shot, long task sequence incremental meta-learning scenarios, we reduce the gap between IML and the joint-training upper bound from 3.5% / 10.1% / 13.4% with the current state-of-the-art to 2.6% / 2.9% / 5.0% with our method on Tiered-ImageNet / Mini-ImageNet / CIFAR100, respectively.