MLDIS-NNCRLGJun 14, 2025

On the existence of consistent adversarial attacks in high-dimensional linear classification

arXiv:2506.12454v11 citationsh-index: 5
Originality Highly original
AI Analysis

This work addresses a foundational problem in understanding adversarial robustness for machine learning researchers, offering theoretical insights into model vulnerabilities in high-dimensional settings.

The paper investigates the distinction between adversarial attacks and misclassifications due to limited data in high-dimensional binary classification, introducing a new error metric to quantify vulnerability to label-preserving perturbations. The theoretical analysis shows that as models become more overparameterized, their vulnerability to such attacks increases, providing insights into model sensitivity mechanisms.

What fundamentally distinguishes an adversarial attack from a misclassification due to limited model expressivity or finite data? In this work, we investigate this question in the setting of high-dimensional binary classification, where statistical effects due to limited data availability play a central role. We introduce a new error metric that precisely capture this distinction, quantifying model vulnerability to consistent adversarial attacks -- perturbations that preserve the ground-truth labels. Our main technical contribution is an exact and rigorous asymptotic characterization of these metrics in both well-specified models and latent space models, revealing different vulnerability patterns compared to standard robust error measures. The theoretical results demonstrate that as models become more overparameterized, their vulnerability to label-preserving perturbations grows, offering theoretical insight into the mechanisms underlying model sensitivity to adversarial attacks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes