LG AIDec 18, 2022

Confidence-aware Training of Smoothed Classifiers for Certified Robustness

arXiv:2212.09000v211.114 citationsh-index: 54Has Code

Originality Incremental advance

AI Analysis

This work addresses the problem of enhancing certified adversarial robustness for machine learning models, particularly in security-critical applications, though it is incremental as it builds on existing smoothed classifier frameworks.

The paper tackles the trade-off between accuracy and adversarial robustness in smoothed classifiers by proposing a training method that uses accuracy under Gaussian noise as a proxy to control robustness sample-wise, resulting in improved certified robustness over state-of-the-art methods, with gains persisting for other robustness types like common corruptions.

Any classifier can be "smoothed out" under Gaussian noise to build a new classifier that is provably robust to $\ell_2$-adversarial perturbations, viz., by averaging its predictions over the noise via randomized smoothing. Under the smoothed classifiers, the fundamental trade-off between accuracy and (adversarial) robustness has been well evidenced in the literature: i.e., increasing the robustness of a classifier for an input can be at the expense of decreased accuracy for some other inputs. In this paper, we propose a simple training method leveraging this trade-off to obtain robust smoothed classifiers, in particular, through a sample-wise control of robustness over the training samples. We make this control feasible by using "accuracy under Gaussian noise" as an easy-to-compute proxy of adversarial robustness for an input. Specifically, we differentiate the training objective depending on this proxy to filter out samples that are unlikely to benefit from the worst-case (adversarial) objective. Our experiments show that the proposed method, despite its simplicity, consistently exhibits improved certified robustness upon state-of-the-art training methods. Somewhat surprisingly, we find these improvements persist even for other notions of robustness, e.g., to various types of common corruptions.

View on arXiv PDF Code

Similar