LG MLMar 19, 2020

Breaking certified defenses: Semantic adversarial examples with spoofed robustness certificates

arXiv:2003.08937v120.857 citationsHas Code

Originality Highly original

AI Analysis

This work exposes a critical vulnerability in certified defenses, impacting the security of machine learning systems that rely on such guarantees.

The paper tackles the problem of certified classifiers that guarantee robustness against adversarial attacks by introducing a new attack method, the Shadow Attack, which causes these networks to mislabel images and produce false robustness certificates while maintaining imperceptibility.

To deflect adversarial attacks, a range of "certified" classifiers have been proposed. In addition to labeling an image, certified classifiers produce (when possible) a certificate guaranteeing that the input image is not an $\ell_p$-bounded adversarial example. We present a new attack that exploits not only the labelling function of a classifier, but also the certificate generator. The proposed method applies large perturbations that place images far from a class boundary while maintaining the imperceptibility property of adversarial examples. The proposed "Shadow Attack" causes certifiably robust networks to mislabel an image and simultaneously produce a "spoofed" certificate of robustness.

View on arXiv PDF Code

Similar