On the Geometry of Adversarial Examples
This work addresses the vulnerability of ML models to adversarial attacks, providing theoretical insights that could improve robustness, though it is incremental as it builds on existing geometric concepts.
The paper tackles the problem of adversarial examples in machine learning by proposing a geometric framework to analyze their high-dimensional geometry, proving tradeoffs in robustness, sample inefficiency of adversarial training, and conditions for robust classifiers.
Adversarial examples are a pervasive phenomenon of machine learning models where seemingly imperceptible perturbations to the input lead to misclassifications for otherwise statistically accurate models. We propose a geometric framework, drawing on tools from the manifold reconstruction literature, to analyze the high-dimensional geometry of adversarial examples. In particular, we highlight the importance of codimension: for low-dimensional data manifolds embedded in high-dimensional space there are many directions off the manifold in which to construct adversarial examples. Adversarial examples are a natural consequence of learning a decision boundary that classifies the low-dimensional data manifold well, but classifies points near the manifold incorrectly. Using our geometric framework we prove (1) a tradeoff between robustness under different norms, (2) that adversarial training in balls around the data is sample inefficient, and (3) sufficient sampling conditions under which nearest neighbor classifiers and ball-based adversarial training are robust.