On the existence of solutions to adversarial training in multiclass classification
This provides theoretical guarantees for constructing robust classifiers against adversarial perturbations, addressing a foundational problem in machine learning security, though it is incremental on prior work.
The paper proves the existence of Borel measurable robust classifiers for adversarial training in multiclass classification, expanding connections with optimal transport and linking it to total variation regularization, with a corollary improving prior results in binary classification.
We study three models of the problem of adversarial training in multiclass classification designed to construct robust classifiers against adversarial perturbations of data in the agnostic-classifier setting. We prove the existence of Borel measurable robust classifiers in each model and provide a unified perspective of the adversarial training problem, expanding the connections with optimal transport initiated by the authors in previous work and developing new connections between adversarial training in the multiclass setting and total variation regularization. As a corollary of our results, we prove the existence of Borel measurable solutions to the agnostic adversarial training problem in the binary classification setting, a result that improves results in the literature of adversarial training, where robust classifiers were only known to exist within the enlarged universal $σ$-algebra of the feature space.