Adversarial Risk Bounds via Function Transformation
This work addresses the need for theoretical guarantees on adversarial robustness in machine learning, though it appears incremental as it builds on standard learning-theoretic techniques.
The paper tackles the problem of characterizing robustness of classifiers to adversarial perturbations by deriving bounds for adversarial risk using a new class of function transformations, resulting in error rates on the same order as generalization error for linear and neural network models.
We derive bounds for a notion of adversarial risk, designed to characterize the robustness of linear and neural network classifiers to adversarial perturbations. Specifically, we introduce a new class of function transformations with the property that the risk of the transformed functions upper-bounds the adversarial risk of the original functions. This reduces the problem of deriving bounds on the adversarial risk to the problem of deriving risk bounds using standard learning-theoretic techniques. We then derive bounds on the Rademacher complexities of the transformed function classes, obtaining error rates on the same order as the generalization error of the original function classes. We also discuss extensions of our theory to multiclass classification and regression. Finally, we provide two algorithms for optimizing the adversarial risk bounds in the linear case, and discuss connections to regularization and distributional robustness.