LG CVJun 4, 2025

Tripartite Weight-Space Ensemble for Few-Shot Class-Incremental Learning

arXiv:2506.15720v13 citationsh-index: 43CVPR

Originality Incremental advance

AI Analysis

This addresses the challenge of continual learning with limited data for AI systems that need to adapt to new classes over time, representing an incremental improvement over existing FSCIL methods.

The paper tackles the problem of catastrophic forgetting and overfitting in few-shot class incremental learning (FSCIL) by introducing a tripartite weight-space ensemble method that interpolates base, previous, and current models, achieving state-of-the-art results on miniImageNet, CUB200, and CIFAR100 datasets.

Few-shot class incremental learning (FSCIL) enables the continual learning of new concepts with only a few training examples. In FSCIL, the model undergoes substantial updates, making it prone to forgetting previous concepts and overfitting to the limited new examples. Most recent trend is typically to disentangle the learning of the representation from the classification head of the model. A well-generalized feature extractor on the base classes (many examples and many classes) is learned, and then fixed during incremental learning. Arguing that the fixed feature extractor restricts the model's adaptability to new classes, we introduce a novel FSCIL method to effectively address catastrophic forgetting and overfitting issues. Our method enables to seamlessly update the entire model with a few examples. We mainly propose a tripartite weight-space ensemble (Tri-WE). Tri-WE interpolates the base, immediately previous, and current models in weight-space, especially for the classification heads of the models. Then, it collaboratively maintains knowledge from the base and previous models. In addition, we recognize the challenges of distilling generalized representations from the previous model from scarce data. Hence, we suggest a regularization loss term using amplified data knowledge distillation. Simply intermixing the few-shot data, we can produce richer data enabling the distillation of critical knowledge from the previous model. Consequently, we attain state-of-the-art results on the miniImageNet, CUB200, and CIFAR100 datasets.

View on arXiv PDF

Similar