The Interplay between Distribution Parameters and the Accuracy-Robustness Tradeoff in Classification
This work provides theoretical insights into the fundamental limits of adversarial robustness for researchers in machine learning security, though it is incremental as it builds on prior analyses of distributional parameters.
The paper tackles the trade-off between accuracy and robustness in adversarial training by analyzing a binary Gaussian mixture classification problem, showing that the natural error gap between optimal Bayes and adversarial classifiers is minimized when classes are balanced and scales as Θ(ε²) for small adversarial budgets.
Adversarial training tends to result in models that are less accurate on natural (unperturbed) examples compared to standard models. This can be attributed to either an algorithmic shortcoming or a fundamental property of the training data distribution, which admits different solutions for optimal standard and adversarial classifiers. In this work, we focus on the latter case under a binary Gaussian mixture classification problem. Unlike earlier work, we aim to derive the natural accuracy gap between the optimal Bayes and adversarial classifiers, and study the effect of different distributional parameters, namely separation between class centroids, class proportions, and the covariance matrix, on the derived gap. We show that under certain conditions, the natural error of the optimal adversarial classifier, as well as the gap, are locally minimized when classes are balanced, contradicting the performance of the Bayes classifier where perfect balance induces the worst accuracy. Moreover, we show that with an $\ell_\infty$ bounded perturbation and an adversarial budget of $ε$, this gap is $Θ(ε^2)$ for the worst-case parameters, which for suitably small $ε$ indicates the theoretical possibility of achieving robust classifiers with near-perfect accuracy, which is rarely reflected in practical algorithms.