CVMar 17, 2023

A Unified Continual Learning Framework with General Parameter-Efficient Tuning

Qiankun Gao, Chen Zhao, Yifan Sun, Teng Xi, Gang Zhang, Bernard Ghanem, Jian Zhang

arXiv:2303.10070v232.5158 citationsh-index: 73Has Code

Originality Incremental advance

AI Analysis

This work addresses the problem of efficient and effective continual learning for AI practitioners by enabling broader use of PET methods, though it is incremental as it builds on existing PET paradigms.

The paper tackles the challenge of applying Parameter-Efficient-Tuning (PET) methods beyond prompting in Continual Learning (CL) by proposing a unified framework called Learning-Accumulation-Ensemble (LAE), which integrates PET methods like Adapter, LoRA, or Prefix to adapt pre-trained models to new tasks with fewer parameters, achieving state-of-the-art results such as 1.3% and 3.6% accuracy improvements on CIFAR100 and ImageNet-R datasets.

The "pre-training $\rightarrow$ downstream adaptation" presents both new opportunities and challenges for Continual Learning (CL). Although the recent state-of-the-art in CL is achieved through Parameter-Efficient-Tuning (PET) adaptation paradigm, only prompt has been explored, limiting its application to Transformers only. In this paper, we position prompting as one instantiation of PET, and propose a unified CL framework with general PET, dubbed as Learning-Accumulation-Ensemble (LAE). PET, e.g., using Adapter, LoRA, or Prefix, can adapt a pre-trained model to downstream tasks with fewer parameters and resources. Given a PET method, our LAE framework incorporates it for CL with three novel designs. 1) Learning: the pre-trained model adapts to the new task by tuning an online PET module, along with our adaptation speed calibration to align different PET modules, 2) Accumulation: the task-specific knowledge learned by the online PET module is accumulated into an offline PET module through momentum update, 3) Ensemble: During inference, we respectively construct two experts with online/offline PET modules (which are favored by the novel/historical tasks) for prediction ensemble. We show that LAE is compatible with a battery of PET methods and gains strong CL capability. For example, LAE with Adaptor PET surpasses the prior state-of-the-art by 1.3% and 3.6% in last-incremental accuracy on CIFAR100 and ImageNet-R datasets, respectively. Code is available at \url{https://github.com/gqk/LAE}.

View on arXiv PDF Code

Similar