CASP: Few-Shot Class-Incremental Learning with CLS Token Attention Steering Prompts
This work addresses the problem of catastrophic forgetting and limited data adaptation in continual learning for AI systems, representing an incremental improvement over existing prompt-based methods.
The paper tackles the challenge of few-shot class-incremental learning by proposing CASP, a method that uses CLS token attention steering prompts to modulate self-attention weights and enhance generalization, achieving state-of-the-art performance on datasets like CUB200, CIFAR100, and ImageNet-R without fine-tuning during incremental phases and with reduced parameter overhead.
Few-shot class-incremental learning (FSCIL) presents a core challenge in continual learning, requiring models to rapidly adapt to new classes with very limited samples while mitigating catastrophic forgetting. Recent prompt-based methods, which integrate pretrained backbones with task-specific prompts, have made notable progress. However, under extreme few-shot incremental settings, the model's ability to transfer and generalize becomes critical, and it is thus essential to leverage pretrained knowledge to learn feature representations that can be shared across future categories during the base session. Inspired by the mechanism of the CLS token, which is similar to human attention and progressively filters out task-irrelevant information, we propose the CLS Token Attention Steering Prompts (CASP). This approach introduces class-shared trainable bias parameters into the query, key, and value projections of the CLS token to explicitly modulate the self-attention weights. To further enhance generalization, we also design an attention perturbation strategy and perform Manifold Token Mixup in the shallow feature space, synthesizing potential new class features to improve generalization and reserve the representation capacity for upcoming tasks. Experiments on the CUB200, CIFAR100, and ImageNet-R datasets demonstrate that CASP outperforms state-of-the-art methods in both standard and fine-grained FSCIL settings without requiring fine-tuning during incremental phases and while significantly reducing the parameter overhead.