Attack as Defense: Characterizing Adversarial Examples using Robustness
This addresses the vulnerability of deep learning systems to adversarial attacks, offering a potentially robust defense mechanism for real-world applications, though it appears incremental as it builds on existing robustness concepts.
The paper tackles the problem of defending deep learning software against adversarial attacks by proposing a novel defense framework called Attack as Defense (A2D), which detects adversarial examples based on their lower robustness, and shows effectiveness with results such as reducing attack success rates to 0% on CIFAR10.
As a new programming paradigm, deep learning has expanded its application to many real-world problems. At the same time, deep learning based software are found to be vulnerable to adversarial attacks. Though various defense mechanisms have been proposed to improve robustness of deep learning software, many of them are ineffective against adaptive attacks. In this work, we propose a novel characterization to distinguish adversarial examples from benign ones based on the observation that adversarial examples are significantly less robust than benign ones. As existing robustness measurement does not scale to large networks, we propose a novel defense framework, named attack as defense (A2D), to detect adversarial examples by effectively evaluating an example's robustness. A2D uses the cost of attacking an input for robustness evaluation and identifies those less robust examples as adversarial since less robust examples are easier to attack. Extensive experiment results on MNIST, CIFAR10 and ImageNet show that A2D is more effective than recent promising approaches. We also evaluate our defence against potential adaptive attacks and show that A2D is effective in defending carefully designed adaptive attacks, e.g., the attack success rate drops to 0% on CIFAR10.