LGAIDec 18, 2022

Confidence-aware Training of Smoothed Classifiers for Certified Robustness

arXiv:2212.09000v214 citationsh-index: 54
Originality Incremental advance
AI Analysis

This work addresses the problem of enhancing certified adversarial robustness for machine learning models, particularly in security-critical applications, though it is incremental as it builds on existing smoothed classifier frameworks.

The paper tackles the trade-off between accuracy and adversarial robustness in smoothed classifiers by proposing a training method that uses accuracy under Gaussian noise as a proxy to control robustness sample-wise, resulting in improved certified robustness over state-of-the-art methods, with gains persisting for other robustness types like common corruptions.

Any classifier can be "smoothed out" under Gaussian noise to build a new classifier that is provably robust to $\ell_2$-adversarial perturbations, viz., by averaging its predictions over the noise via randomized smoothing. Under the smoothed classifiers, the fundamental trade-off between accuracy and (adversarial) robustness has been well evidenced in the literature: i.e., increasing the robustness of a classifier for an input can be at the expense of decreased accuracy for some other inputs. In this paper, we propose a simple training method leveraging this trade-off to obtain robust smoothed classifiers, in particular, through a sample-wise control of robustness over the training samples. We make this control feasible by using "accuracy under Gaussian noise" as an easy-to-compute proxy of adversarial robustness for an input. Specifically, we differentiate the training objective depending on this proxy to filter out samples that are unlikely to benefit from the worst-case (adversarial) objective. Our experiments show that the proposed method, despite its simplicity, consistently exhibits improved certified robustness upon state-of-the-art training methods. Somewhat surprisingly, we find these improvements persist even for other notions of robustness, e.g., to various types of common corruptions.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes