Unrestricted Adversarial Examples
This addresses the need for more comprehensive safety evaluations in ML, though it is incremental by focusing on unconstrained adversaries rather than introducing a new defense method.
The paper tackles the problem of evaluating machine learning model safety by shifting from norm-constrained to unconstrained adversarial attacks, proposing a two-player contest with a simple unambiguous dataset to assess worst-case adversarial risk.
We introduce a two-player contest for evaluating the safety and robustness of machine learning systems, with a large prize pool. Unlike most prior work in ML robustness, which studies norm-constrained adversaries, we shift our focus to unconstrained adversaries. Defenders submit machine learning models, and try to achieve high accuracy and coverage on non-adversarial data while making no confident mistakes on adversarial inputs. Attackers try to subvert defenses by finding arbitrary unambiguous inputs where the model assigns an incorrect label with high confidence. We propose a simple unambiguous dataset ("bird-or- bicycle") to use as part of this contest. We hope this contest will help to more comprehensively evaluate the worst-case adversarial risk of machine learning models.