Et Tu Certifications: Robustness Certificates Yield Better Adversarial Examples
This work reveals a paradox where releasing certifications can reduce security, impacting practitioners relying on robustness guarantees in adversarial machine learning.
The paper tackled the problem of whether robustness certifications can compromise neural network security by introducing a Certification Aware Attack that exploits certifications to produce adversarial examples 74% more often and with 10% lower median perturbation norm than comparable attacks.
In guaranteeing the absence of adversarial examples in an instance's neighbourhood, certification mechanisms play an important role in demonstrating neural net robustness. In this paper, we ask if these certifications can compromise the very models they help to protect? Our new \emph{Certification Aware Attack} exploits certifications to produce computationally efficient norm-minimising adversarial examples $74 \%$ more often than comparable attacks, while reducing the median perturbation norm by more than $10\%$. While these attacks can be used to assess the tightness of certification bounds, they also highlight that releasing certifications can paradoxically reduce security.