Multi-attacks: Many images $+$ the same adversarial attack $\to$ many target labels
This work exposes a significant vulnerability in machine learning classifiers by demonstrating the feasibility of multi-attacks, which could impact security-critical applications like image recognition systems.
The authors tackled the problem of designing a single adversarial perturbation that can simultaneously alter the classification of multiple images to desired target labels, achieving this for up to hundreds of images and target classes at once. They characterized the maximum number of images affected under various conditions, estimating around 10^O(100) high-confidence class regions in pixel space, which poses a challenge for exhaustive defenses.
We show that we can easily design a single adversarial perturbation $P$ that changes the class of $n$ images $X_1,X_2,\dots,X_n$ from their original, unperturbed classes $c_1, c_2,\dots,c_n$ to desired (not necessarily all the same) classes $c^*_1,c^*_2,\dots,c^*_n$ for up to hundreds of images and target classes at once. We call these \textit{multi-attacks}. Characterizing the maximum $n$ we can achieve under different conditions such as image resolution, we estimate the number of regions of high class confidence around a particular image in the space of pixels to be around $10^{\mathcal{O}(100)}$, posing a significant problem for exhaustive defense strategies. We show several immediate consequences of this: adversarial attacks that change the resulting class based on their intensity, and scale-independent adversarial examples. To demonstrate the redundancy and richness of class decision boundaries in the pixel space, we look for its two-dimensional sections that trace images and spell words using particular classes. We also show that ensembling reduces susceptibility to multi-attacks, and that classifiers trained on random labels are more susceptible. Our code is available on GitHub.