CL AIJul 16, 2024

SwitchCIT: Switching for Continual Instruction Tuning

Xinbo Wu, Max Hartman, Vidhata Arjun Jayaraman, Lav R. Varshney

arXiv:2407.11780v21.91 citationsh-index: 34

Originality Incremental advance

AI Analysis

This addresses the problem of adapting large models to evolving tasks without forgetting for users in AI and machine learning, though it is incremental as it builds on existing continual learning and parameter-efficient tuning methods.

The paper tackles catastrophic forgetting in continual instruction tuning of large language and multimodal models by introducing a switching mechanism to route computations to parameter-efficient tuned models, demonstrating effectiveness across natural language generation and vision-language tasks with advantages in efficiency, scalability, portability, and privacy.

Large language models (LLMs) and multimodal models (MMs) have exhibited impressive capabilities in various domains, particularly in general language understanding and visual reasoning. However, these models, trained on massive data, may not be finely optimized for specific tasks triggered by instructions. Continual instruction tuning is crucial to adapt a large model to evolving tasks and domains, ensuring their effectiveness and relevance across a wide range of applications. In the context of continual instruction tuning, where models are sequentially trained on different tasks, catastrophic forgetting can occur, leading to performance degradation on previously learned tasks. This work addresses the catastrophic forgetting in continual instruction learning through a switching mechanism for routing computations to parameter-efficient tuned models. We demonstrate the effectiveness of our method through experiments on continual instruction tuning of different natural language generation tasks and vision-language tasks. We also showcase the advantages of our proposed method in terms of efficiency, scalability, portability, and privacy preservation.

View on arXiv PDF

Similar