On Evaluating Adversarial Robustness
This work tackles the critical problem of unreliable security evaluations in adversarial machine learning, which affects researchers, reviewers, and practitioners in the field.
The paper addresses the difficulty in correctly evaluating defenses against adversarial examples, noting that most proposed defenses fail under adaptive attacks. It provides methodological foundations, reviews best practices, and suggests new evaluation methods to help researchers and reviewers avoid common pitfalls.
Correctly evaluating defenses against adversarial examples has proven to be extremely difficult. Despite the significant amount of recent work attempting to design defenses that withstand adaptive attacks, few have succeeded; most papers that propose defenses are quickly shown to be incorrect. We believe a large contributing factor is the difficulty of performing security evaluations. In this paper, we discuss the methodological foundations, review commonly accepted best practices, and suggest new methods for evaluating defenses to adversarial examples. We hope that both researchers developing defenses as well as readers and reviewers who wish to understand the completeness of an evaluation consider our advice in order to avoid common pitfalls.