Towards Bridging the gap between Empirical and Certified Robustness against Adversarial Examples
This work addresses the challenge of simultaneously achieving high empirical and certified robustness against adversarial examples for researchers and practitioners working on robust machine learning, offering an incremental improvement by combining existing techniques.
This paper proposes a method called "Certification through Adaptation" that converts adversarially trained (AT) models into randomized smoothing classifiers during inference to provide certified robustness without sacrificing empirical robustness. Additionally, they introduce "Auto-Noise" to efficiently determine appropriate noise levels for certification. Their approach achieves average certified radius (ACR) scores of up to 1.102 for CIFAR-10 and 1.148 for ImageNet, maintaining empirical robustness and benign accuracy.
The current state-of-the-art defense methods against adversarial examples typically focus on improving either empirical or certified robustness. Among them, adversarially trained (AT) models produce empirical state-of-the-art defense against adversarial examples without providing any robustness guarantees for large classifiers or higher-dimensional inputs. In contrast, existing randomized smoothing based models achieve state-of-the-art certified robustness while significantly degrading the empirical robustness against adversarial examples. In this paper, we propose a novel method, called \emph{Certification through Adaptation}, that transforms an AT model into a randomized smoothing classifier during inference to provide certified robustness for $\ell_2$ norm without affecting their empirical robustness against adversarial attacks. We also propose \emph{Auto-Noise} technique that efficiently approximates the appropriate noise levels to flexibly certify the test examples using randomized smoothing technique. Our proposed \emph{Certification through Adaptation} with \emph{Auto-Noise} technique achieves an \textit{average certified radius (ACR) scores} up to $1.102$ and $1.148$ respectively for CIFAR-10 and ImageNet datasets using AT models without affecting their empirical robustness or benign accuracy. Therefore, our paper is a step towards bridging the gap between the empirical and certified robustness against adversarial examples by achieving both using the same classifier.