LG MLJun 18, 2020

Beware the Black-Box: on the Robustness of Recent Defenses to Adversarial Examples

Kaleel Mahmood, Deniz Gurevin, Marten van Dijk, Phuong Ha Nguyen

arXiv:2006.10876v28.528 citationsHas Code

Originality Incremental advance

AI Analysis

This work highlights a critical gap in adversarial machine learning by showing that many defenses are vulnerable to black-box attacks, urging the field to develop more robust solutions.

The paper evaluated nine recent adversarial defense methods against adaptive black-box attacks on CIFAR-10 and Fashion-MNIST datasets, finding that most defenses (7 out of 9) offered only marginal security improvements of less than 25% compared to undefended networks.

Many defenses have recently been proposed at venues like NIPS, ICML, ICLR and CVPR. These defenses are mainly focused on mitigating white-box attacks. They do not properly examine black-box attacks. In this paper, we expand upon the analysis of these defenses to include adaptive black-box adversaries. Our evaluation is done on nine defenses including Barrage of Random Transforms, ComDefend, Ensemble Diversity, Feature Distillation, The Odds are Odd, Error Correcting Codes, Distribution Classifier Defense, K-Winner Take All and Buffer Zones. Our investigation is done using two black-box adversarial models and six widely studied adversarial attacks for CIFAR-10 and Fashion-MNIST datasets. Our analyses show most recent defenses (7 out of 9) provide only marginal improvements in security ($<25\%$), as compared to undefended networks. For every defense, we also show the relationship between the amount of data the adversary has at their disposal, and the effectiveness of adaptive black-box attacks. Overall, our results paint a clear picture: defenses need both thorough white-box and black-box analyses to be considered secure. We provide this large scale study and analyses to motivate the field to move towards the development of more robust black-box defenses.

View on arXiv PDF Code

Similar