Should Adversarial Attacks Use Pixel p-Norm?
This work addresses the gap in understanding how to measure adversarial perturbations effectively for researchers and practitioners in machine learning security, though it is incremental as it critiques existing methods without proposing a new one.
The authors tackled the problem of evaluating adversarial attacks on image classification systems by showing that pixel p-norm and other common measures do not align with human perception, based on a behavioral study.
Adversarial attacks aim to confound machine learning systems, while remaining virtually imperceptible to humans. Attacks on image classification systems are typically gauged in terms of $p$-norm distortions in the pixel feature space. We perform a behavioral study, demonstrating that the pixel $p$-norm for any $0\le p \le \infty$, and several alternative measures including earth mover's distance, structural similarity index, and deep net embedding, do not fit human perception. Our result has the potential to improve the understanding of adversarial attack and defense strategies.