Recent Advances in Understanding Adversarial Robustness of Deep Neural Networks
It provides a comprehensive overview for researchers and practitioners to address the problem of adversarial vulnerabilities in DNNs, but it is incremental as a survey paper.
This paper surveys recent advances in understanding adversarial robustness in deep neural networks, covering definitions, benchmarks, theoretical bounds, correlations with other model indicators, and costs of adversarial training.
Adversarial examples are inevitable on the road of pervasive applications of deep neural networks (DNN). Imperceptible perturbations applied on natural samples can lead DNN-based classifiers to output wrong prediction with fair confidence score. It is increasingly important to obtain models with high robustness that are resistant to adversarial examples. In this paper, we survey recent advances in how to understand such intriguing property, i.e. adversarial robustness, from different perspectives. We give preliminary definitions on what adversarial attacks and robustness are. After that, we study frequently-used benchmarks and mention theoretically-proved bounds for adversarial robustness. We then provide an overview on analyzing correlations among adversarial robustness and other critical indicators of DNN models. Lastly, we introduce recent arguments on potential costs of adversarial training which have attracted wide attention from the research community.