Evading Black-box Classifiers Without Breaking Eggs
This work addresses a critical issue for security-critical machine learning systems, such as those detecting malware or harmful content, by proposing a more realistic cost model for evasion attacks, though it is incremental in nature.
The paper tackles the problem of decision-based evasion attacks on black-box classifiers by highlighting the flawed cost metric of total queries, which fails to account for the asymmetric cost of 'bad' queries that trigger security filters. The authors design new attacks that reduce bad queries by 1.5-7.3 times, though at the expense of increased total queries, and pose an open problem for more effective attacks under realistic cost metrics.
Decision-based evasion attacks repeatedly query a black-box classifier to generate adversarial examples. Prior work measures the cost of such attacks by the total number of queries made to the classifier. We argue this metric is flawed. Most security-critical machine learning systems aim to weed out "bad" data (e.g., malware, harmful content, etc). Queries to such systems carry a fundamentally asymmetric cost: queries detected as "bad" come at a higher cost because they trigger additional security filters, e.g., usage throttling or account suspension. Yet, we find that existing decision-based attacks issue a large number of "bad" queries, which likely renders them ineffective against security-critical systems. We then design new attacks that reduce the number of bad queries by $1.5$-$7.3\times$, but often at a significant increase in total (non-bad) queries. We thus pose it as an open problem to build black-box attacks that are more effective under realistic cost metrics.