Feature compression is the root cause of adversarial fragility in neural network classifiers
This addresses the vulnerability of neural networks to adversarial attacks, which is a critical issue for AI security, though it builds on prior feature-compression explanations.
The paper tackles the problem of adversarial fragility in neural network classifiers by showing that their robustness degrades with increasing input dimension, being only 1/√d of optimal classifiers, as validated through numerical experiments including on ImageNet.
In this paper, we uniquely study the adversarial robustness of deep neural networks (NN) for classification tasks against that of optimal classifiers. We look at the smallest magnitude of possible additive perturbations that can change a classifier's output. We provide a matrix-theoretic explanation of the adversarial fragility of deep neural networks for classification. In particular, our theoretical results show that a neural network's adversarial robustness can degrade as the input dimension $d$ increases. Analytically, we show that neural networks' adversarial robustness can be only $1/\sqrt{d}$ of the best possible adversarial robustness of optimal classifiers. Our theories match remarkably well with numerical experiments of practically trained NN, including NN for ImageNet images. The matrix-theoretic explanation is consistent with an earlier information-theoretic feature-compression-based explanation for the adversarial fragility of neural networks.