First-order Adversarial Vulnerability of Neural Networks and Input Dimension
This addresses a critical security issue for AI systems using neural networks, particularly in image processing, by revealing a fundamental dimension-dependent vulnerability that is incremental to existing adversarial attack research.
The paper tackles the problem of adversarial vulnerability in neural networks by showing that it increases with input dimension due to gradients of the training objective, with vulnerability growing as the square root of input dimension at initialization and persisting after training, though attenuated by regularization.
Over the past few years, neural networks were proven vulnerable to adversarial images: targeted but imperceptible image perturbations lead to drastically different predictions. We show that adversarial vulnerability increases with the gradients of the training objective when viewed as a function of the inputs. Surprisingly, vulnerability does not depend on network topology: for many standard network architectures, we prove that at initialization, the $\ell_1$-norm of these gradients grows as the square root of the input dimension, leaving the networks increasingly vulnerable with growing image size. We empirically show that this dimension dependence persists after either usual or robust training, but gets attenuated with higher regularization.