LGMay 22, 2024

Gradient Projection For Continual Parameter-Efficient Tuning

Jingyang Qiao, Zhizhong Zhang, Xin Tan, Yanyun Qu, Wensheng Zhang, Zhi Han, Yuan Xie

arXiv:2405.13383v310.41 citationsh-index: 12

Originality Incremental advance

AI Analysis

This work addresses forgetting issues in continual learning for large models, offering an incremental improvement to existing PET methods.

The paper tackles the trade-off between learning new content and protecting old knowledge in parameter-efficient tuning (PET) methods, proposing a unified gradient projection framework that reduces forgetting across various continual learning settings with less memory and training time.

Parameter-efficient tunings (PETs) have demonstrated impressive performance and promising perspectives in training large models, while they are still confronted with a common problem: the trade-off between learning new content and protecting old knowledge, leading to zero-shot generalization collapse, and cross-modal hallucination. In this paper, we reformulate Adapter, LoRA, Prefix-tuning, and Prompt-tuning from the perspective of gradient projection, and firstly propose a unified framework called Parameter Efficient Gradient Projection (PEGP). We introduce orthogonal gradient projection into different PET paradigms and theoretically demonstrate that the orthogonal condition for the gradient can effectively resist forgetting even for large-scale models. It therefore modifies the gradient towards the direction that has less impact on the old feature space, with less extra memory space and training time. We extensively evaluate our method with different backbones, including ViT and CLIP, on diverse datasets, and experiments comprehensively demonstrate its efficiency in reducing forgetting in class, online class, domain, task, and multi-modality continual settings. The project page is available at https://dmcv-ecnu-pegp.github.io/.

View on arXiv PDF

Similar