LGMLMar 19, 2020

Breaking certified defenses: Semantic adversarial examples with spoofed robustness certificates

arXiv:2003.08937v157 citations
AI Analysis

This work exposes a critical vulnerability in certified defenses, impacting the security of machine learning systems that rely on such guarantees.

The paper tackles the problem of certified classifiers that guarantee robustness against adversarial attacks by introducing a new attack method, the Shadow Attack, which causes these networks to mislabel images and produce false robustness certificates while maintaining imperceptibility.

To deflect adversarial attacks, a range of "certified" classifiers have been proposed. In addition to labeling an image, certified classifiers produce (when possible) a certificate guaranteeing that the input image is not an $\ell_p$-bounded adversarial example. We present a new attack that exploits not only the labelling function of a classifier, but also the certificate generator. The proposed method applies large perturbations that place images far from a class boundary while maintaining the imperceptibility property of adversarial examples. The proposed "Shadow Attack" causes certifiably robust networks to mislabel an image and simultaneously produce a "spoofed" certificate of robustness.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes