A Hierarchical Assessment of Adversarial Severity
This work addresses the missing dimension of severity assessment in adversarial robustness for neural networks, offering incremental improvements in defense strategies.
The paper tackles the problem of assessing the downstream impact of adversarial attacks by introducing 'Adversarial Severity' to quantify semantic errors, and proposes hierarchical attacks and training methods, resulting in a 1.85% boost in adversarial robustness and a 0.17 reduction in severity on average.
Adversarial Robustness is a growing field that evidences the brittleness of neural networks. Although the literature on adversarial robustness is vast, a dimension is missing in these studies: assessing how severe the mistakes are. We call this notion "Adversarial Severity" since it quantifies the downstream impact of adversarial corruptions by computing the semantic error between the misclassification and the proper label. We propose to study the effects of adversarial noise by measuring the Robustness and Severity into a large-scale dataset: iNaturalist-H. Our contributions are: (i) we introduce novel Hierarchical Attacks that harness the rich structured space of labels to create adversarial examples. (ii) These attacks allow us to benchmark the Adversarial Robustness and Severity of classification models. (iii) We enhance the traditional adversarial training with a simple yet effective Hierarchical Curriculum Training to learn these nodes gradually within the hierarchical tree. We perform extensive experiments showing that hierarchical defenses allow deep models to boost the adversarial Robustness by 1.85% and reduce the severity of all attacks by 0.17, on average.