Memory-Efficient Continual Learning with CLIP Models
For practitioners using CLIP models in continual learning scenarios, this method offers a memory-efficient solution to catastrophic forgetting.
CLIP models struggle with continual learning due to catastrophic forgetting when memory buffers are small. The proposed method dynamically reweights losses per class, achieving strong performance on CIFAR-100, ImageNet1K, and DomainNet with minimal memory usage.
Contrastive Language-Image Pretraining (CLIP) models excel at understanding image-text relationships but struggle with adapting to new data without forgetting prior knowledge. To address this, models are typically fine-tuned using both new task data and a memory buffer of past tasks. However, CLIP's contrastive loss suffers when the memory buffer is small, leading to performance degradation on previous tasks. We propose a memory-efficient, distributionally robust method that dynamically reweights losses per class during training. Our approach, tested on class incremental settings (CIFAR-100, ImageNet1K) and a domain incremental setting (DomainNet) adapts CLIP models quickly while minimizing catastrophic forgetting, even with minimal memory usage.