Cost-Sensitive Robustness against Adversarial Examples
This work addresses the need for more practical adversarial robustness in machine learning by focusing on real-world scenarios where not all attacks are equally harmful, though it is incremental as it builds on existing robust training methods.
The paper tackles the problem of training classifiers that are robust to adversarial examples by introducing cost-sensitive robustness, which prioritizes certain adversarial transformations over others based on a cost matrix. The result is a method that reduces cost-sensitive robust error on MNIST and CIFAR10 models while maintaining classification accuracy.
Several recent works have developed methods for training classifiers that are certifiably robust against norm-bounded adversarial perturbations. These methods assume that all the adversarial transformations are equally important, which is seldom the case in real-world applications. We advocate for cost-sensitive robustness as the criteria for measuring the classifier's performance for tasks where some adversarial transformation are more important than others. We encode the potential harm of each adversarial transformation in a cost matrix, and propose a general objective function to adapt the robust training method of Wong & Kolter (2018) to optimize for cost-sensitive robustness. Our experiments on simple MNIST and CIFAR10 models with a variety of cost matrices show that the proposed approach can produce models with substantially reduced cost-sensitive robust error, while maintaining classification accuracy.