LGDec 3, 2025

Automatic Attack Discovery for Few-Shot Class-Incremental Learning via Large Language Models

arXiv:2512.03882v1

Originality Highly original

AI Analysis

This work addresses security risks in continual learning for AI systems, offering a novel automated attack discovery approach that is incremental in improving attack efficiency.

The paper tackles the security vulnerabilities in few-shot class-incremental learning (FSCIL) by proposing ACraft, an automated method using large language models to discover optimal attacks, which significantly degrades the performance of state-of-the-art FSCIL methods and outperforms human expert-designed attacks while reducing costs.

Few-shot class incremental learning (FSCIL) is a more realistic and challenging paradigm in continual learning to incrementally learn unseen classes and overcome catastrophic forgetting on base classes with only a few training examples. Previous efforts have primarily centered around studying more effective FSCIL approaches. By contrast, less attention was devoted to thinking the security issues in contributing to FSCIL. This paper aims to provide a holistic study of the impact of attacks on FSCIL. We first derive insights by systematically exploring how human expert-designed attack methods (i.e., PGD, FGSM) affect FSCIL. We find that those methods either fail to attack base classes, or suffer from huge labor costs due to relying on huge expert knowledge. This highlights the need to craft a specialized attack method for FSCIL. Grounded in these insights, in this paper, we propose a simple yet effective ACraft method to automatically steer and discover optimal attack methods targeted at FSCIL by leveraging Large Language Models (LLMs) without human experts. Moreover, to improve the reasoning between LLMs and FSCIL, we introduce a novel Proximal Policy Optimization (PPO) based reinforcement learning to optimize learning, making LLMs generate better attack methods in the next generation by establishing positive feedback. Experiments on mainstream benchmarks show that our ACraft significantly degrades the performance of state-of-the-art FSCIL methods and dramatically beyond human expert-designed attack methods while maintaining the lowest costs of attack.

View on arXiv PDF

Similar