Attention Masks Help Adversarial Attacks to Bypass Safety Detectors
This addresses the challenge of bypassing safety detectors in AI systems, though it appears incremental as it builds on existing adversarial attack methods with specific improvements.
The paper tackles the problem of making adversarial attacks stealthier against XAI safety monitors by developing an adaptive attention mask generation framework that guides PGD attacks, achieving better balance in stealth, efficiency, and explainability than benchmarks like PGD, Sparsefool, and SINIFGSM on MNIST and CIFAR-10 datasets.
Despite recent research advancements in adversarial attack methods, current approaches against XAI monitors are still discoverable and slower. In this paper, we present an adaptive framework for attention mask generation to enable stealthy, explainable and efficient PGD image classification adversarial attack under XAI monitors. Specifically, we utilize mutation XAI mixture and multitask self-supervised X-UNet for attention mask generation to guide PGD attack. Experiments on MNIST (MLP), CIFAR-10 (AlexNet) have shown that our system can outperform benchmark PGD, Sparsefool and SOTA SINIFGSM in balancing among stealth, efficiency and explainability which is crucial for effectively fooling SOTA defense protected classifiers.