Adversarial Examples Exist in Two-Layer ReLU Networks for Low Dimensional Linear Subspaces
This work addresses the fundamental issue of adversarial robustness in neural networks for researchers in machine learning security, but it is incremental as it focuses on a specific theoretical case.
The paper tackles the problem of understanding why neural networks are vulnerable to adversarial examples by analyzing two-layer ReLU networks trained on low-dimensional linear subspaces, showing that standard gradient methods produce networks with large gradients orthogonal to the data, making them susceptible to small L2 perturbations, and that reducing initialization scale or adding L2 regularization can improve robustness.
Despite a great deal of research, it is still not well-understood why trained neural networks are highly vulnerable to adversarial examples. In this work we focus on two-layer neural networks trained using data which lie on a low dimensional linear subspace. We show that standard gradient methods lead to non-robust neural networks, namely, networks which have large gradients in directions orthogonal to the data subspace, and are susceptible to small adversarial $L_2$-perturbations in these directions. Moreover, we show that decreasing the initialization scale of the training algorithm, or adding $L_2$ regularization, can make the trained network more robust to adversarial perturbations orthogonal to the data.