Evaluating Adversarial Robustness with Expected Viable Performance
This work addresses the need for better robustness evaluation metrics in machine learning, though it appears incremental as it builds on existing concepts without claiming major breakthroughs.
The paper tackles the problem of evaluating classifier robustness to adversarial perturbations by introducing a metric based on expected functionality, which quantifies robustness as an expected value over perturbation bounds.
We introduce a metric for evaluating the robustness of a classifier, with particular attention to adversarial perturbations, in terms of expected functionality with respect to possible adversarial perturbations. A classifier is assumed to be non-functional (that is, has a functionality of zero) with respect to a perturbation bound if a conventional measure of performance, such as classification accuracy, is less than a minimally viable threshold when the classifier is tested on examples from that perturbation bound. Defining robustness in terms of an expected value is motivated by a domain general approach to robustness quantification.