On the existence of consistent adversarial attacks in high-dimensional linear classification
This work addresses a foundational problem in understanding adversarial robustness for machine learning researchers, offering theoretical insights into model vulnerabilities in high-dimensional settings.
The paper investigates the distinction between adversarial attacks and misclassifications due to limited data in high-dimensional binary classification, introducing a new error metric to quantify vulnerability to label-preserving perturbations. The theoretical analysis shows that as models become more overparameterized, their vulnerability to such attacks increases, providing insights into model sensitivity mechanisms.
What fundamentally distinguishes an adversarial attack from a misclassification due to limited model expressivity or finite data? In this work, we investigate this question in the setting of high-dimensional binary classification, where statistical effects due to limited data availability play a central role. We introduce a new error metric that precisely capture this distinction, quantifying model vulnerability to consistent adversarial attacks -- perturbations that preserve the ground-truth labels. Our main technical contribution is an exact and rigorous asymptotic characterization of these metrics in both well-specified models and latent space models, revealing different vulnerability patterns compared to standard robust error measures. The theoretical results demonstrate that as models become more overparameterized, their vulnerability to label-preserving perturbations grows, offering theoretical insight into the mechanisms underlying model sensitivity to adversarial attacks.