Gradient Methods Provably Converge to Non-Robust Networks
This addresses the fundamental issue of adversarial vulnerability in neural networks for machine learning practitioners, providing theoretical insights into why robustness fails, but it is incremental as it builds on existing research on implicit bias and adversarial examples.
The paper tackles the problem of neural networks' susceptibility to adversarial examples by proving that gradient flow training in depth-2 ReLU networks provably converges to non-robust networks, even when robust alternatives exist, due to an implicit bias towards margin maximization that induces non-robustness.
Despite a great deal of research, it is still unclear why neural networks are so susceptible to adversarial examples. In this work, we identify natural settings where depth-$2$ ReLU networks trained with gradient flow are provably non-robust (susceptible to small adversarial $\ell_2$-perturbations), even when robust networks that classify the training dataset correctly exist. Perhaps surprisingly, we show that the well-known implicit bias towards margin maximization induces bias towards non-robust networks, by proving that every network which satisfies the KKT conditions of the max-margin problem is non-robust.