LG AI CVOct 14, 2024

ATLAS: Adapter-Based Multi-Modal Continual Learning with a Two-Stage Learning Strategy

Hong Li, Zhiquan Tan, Xingyu Li, Weiran Huang

arXiv:2410.10923v112.58 citationsh-index: 17Has Code

Originality Incremental advance

AI Analysis

This work addresses the problem of continual learning for vision-and-language models, offering a novel method that is incremental but shows strong gains in specific scenarios.

The paper tackles catastrophic forgetting in multi-modal continual learning by proposing a two-stage adapter-based learning paradigm, which improves generalization for downstream tasks and reduces forgetting of previous tasks.

While vision-and-language models significantly advance in many fields, the challenge of continual learning is unsolved. Parameter-efficient modules like adapters and prompts present a promising way to alleviate catastrophic forgetting. However, existing works usually learn individual adapters for each task, which may result in redundant knowledge among adapters. Moreover, they continue to use the original pre-trained model to initialize the downstream model, leading to negligible changes in the model's generalization compared to the original model. In addition, there is still a lack of research investigating the consequences of integrating a multi-modal model into the updating procedure for both uni-modal and multi-modal tasks and the subsequent impacts it has on downstream tasks. In this paper, we propose an adapter-based two-stage learning paradigm, a multi-modal continual learning scheme that consists of experience-based learning and novel knowledge expansion, which helps the model fully use experience knowledge and compensate for novel knowledge. Extensive experiments demonstrate that our method is proficient for continual learning. It expands the distribution of representation upstream while also minimizing the negative impact of forgetting previous tasks. Additionally, it enhances the generalization capability for downstream tasks. Furthermore, we incorporate both multi-modal and uni-modal tasks into upstream continual learning. We observe that learning from upstream tasks can help with downstream tasks. Our code will be available at: https://github.com/lihong2303/ATLAS.

View on arXiv PDF Code

Similar