Uniform Convergence of Adversarially Robust Classifiers
This provides a theoretical foundation for adversarially robust classification, addressing a key issue in machine learning security, though it is incremental as it builds on prior convergence results.
The paper tackles the problem of understanding how optimal classifiers behave under adversarial perturbations in the large-data limit, showing that as adversarial strength approaches zero, these classifiers converge to the Bayes classifier in the Hausdorff distance, strengthening previous results that used L1-type convergence.
In recent years there has been significant interest in the effect of different types of adversarial perturbations in data classification problems. Many of these models incorporate the adversarial power, which is an important parameter with an associated trade-off between accuracy and robustness. This work considers a general framework for adversarially-perturbed classification problems, in a large data or population-level limit. In such a regime, we demonstrate that as adversarial strength goes to zero that optimal classifiers converge to the Bayes classifier in the Hausdorff distance. This significantly strengthens previous results, which generally focus on $L^1$-type convergence. The main argument relies upon direct geometric comparisons and is inspired by techniques from geometric measure theory.