LGSep 26, 2024
Caption, Create, Continue: Continual Learning with Pre-trained Generative Vision-Language ModelsIndu Solomon, Aye Phyu Phyu Aung, Uttam Kumar et al.
Continual learning (CL) enables models to adapt to evolving data streams without catastrophic forgetting, a fundamental requirement for real-world AI systems. However, the current methods often depend on large replay buffers or heavily annotated datasets which are impractical due to storage, privacy, and cost constraints. We propose CLTS (Continual Learning via Text-Image Synergy), a novel class-incremental framework that mitigates forgetting without storing real task data. CLTS leverages pre-trained vision-language models, BLIP (Bootstrapping Language-Image Pre-training) for caption generation and stable diffusion for sample generation. Each task is handled by a dedicated Task Head, while a Task Router learns to assign inputs to the correct Task Head using the generated data. On three benchmark datasets, CLTS improves average task accuracy by up to 54% and achieves 63 times better memory efficiency compared to four recent continual learning baselines, demonstrating improved retention and adaptability. CLTS introduces a novel perspective by integrating generative text-image augmentation for scalable continual learning.
LGMay 23, 2024
U-TELL: Unsupervised Task Expert Lifelong LearningIndu Solomon, Aye Phyu Phyu Aung, Uttam Kumar et al.
Continual learning (CL) models are designed to learn new tasks arriving sequentially without re-training the network. However, real-world ML applications have very limited label information and these models suffer from catastrophic forgetting. To address these issues, we propose an unsupervised CL model with task experts called Unsupervised Task Expert Lifelong Learning (U-TELL) to continually learn the data arriving in a sequence addressing catastrophic forgetting. During training of U-TELL, we introduce a new expert on arrival of a new task. Our proposed architecture has task experts, a structured data generator and a task assigner. Each task expert is composed of 3 blocks; i) a variational autoencoder to capture the task distribution and perform data abstraction, ii) a k-means clustering module, and iii) a structure extractor to preserve latent task data signature. During testing, task assigner selects a suitable expert to perform clustering. U-TELL does not store or replay task samples, instead, we use generated structured samples to train the task assigner. We compared U-TELL with five SOTA unsupervised CL methods. U-TELL outperformed all baselines on seven benchmarks and one industry dataset for various CL scenarios with a training time over 6 times faster than the best performing baseline.