FG-UAP: Feature-Gathering Universal Adversarial Perturbation
This work addresses model robustness analysis for deep neural networks by revealing intrinsic vulnerabilities through a novel attack method, though it is incremental as it builds on existing UAP and Neural Collapse concepts.
The paper tackles the problem of generating universal adversarial perturbations (UAPs) that are independent of input images by exploiting the Neural Collapse phenomenon during training, resulting in a method called FG-UAP that achieves strong attack effectiveness across various architectures, including Vision Transformers, with high success rates in untargeted and targeted scenarios.
Deep Neural Networks (DNNs) are susceptible to elaborately designed perturbations, whether such perturbations are dependent or independent of images. The latter one, called Universal Adversarial Perturbation (UAP), is very attractive for model robustness analysis, since its independence of input reveals the intrinsic characteristics of the model. Relatively, another interesting observation is Neural Collapse (NC), which means the feature variability may collapse during the terminal phase of training. Motivated by this, we propose to generate UAP by attacking the layer where NC phenomenon happens. Because of NC, the proposed attack could gather all the natural images' features to its surrounding, which is hence called Feature-Gathering UAP (FG-UAP). We evaluate the effectiveness our proposed algorithm on abundant experiments, including untargeted and targeted universal attacks, attacks under limited dataset, and transfer-based black-box attacks among different architectures including Vision Transformers, which are believed to be more robust. Furthermore, we investigate FG-UAP in the view of NC by analyzing the labels and extracted features of adversarial examples, finding that collapse phenomenon becomes stronger after the model is corrupted. The code will be released when the paper is accepted.