Towards Imperceptible Query-limited Adversarial Attacks with Perceptual Feature Fidelity Loss
This addresses the challenge of achieving imperceptibility in adversarial attacks for image classifiers, particularly in black-box settings with limited queries, representing an incremental improvement over existing methods.
The paper tackles the problem of creating adversarial attacks that are imperceptible to humans by proposing a perceptual metric based on low-level image feature fidelity, showing it robustly reflects imperceptibility and can be integrated into optimization frameworks for better results in query-limited black-box attacks.
Recently, there has been a large amount of work towards fooling deep-learning-based classifiers, particularly for images, via adversarial inputs that are visually similar to the benign examples. However, researchers usually use Lp-norm minimization as a proxy for imperceptibility, which oversimplifies the diversity and richness of real-world images and human visual perception. In this work, we propose a novel perceptual metric utilizing the well-established connection between the low-level image feature fidelity and human visual sensitivity, where we call it Perceptual Feature Fidelity Loss. We show that our metric can robustly reflect and describe the imperceptibility of the generated adversarial images validated in various conditions. Moreover, we demonstrate that this metric is highly flexible, which can be conveniently integrated into different existing optimization frameworks to guide the noise distribution for better imperceptibility. The metric is particularly useful in the challenging black-box attack with limited queries, where the imperceptibility is hard to achieve due to the non-trivial perturbation power.