Certified Training: Small Boxes are All You Need
This work addresses the robustness-accuracy trade-off in adversarial machine learning, offering a novel approach for certified training that could benefit security-critical applications, though it appears incremental as it builds on existing certified defense methods.
The paper tackles the problem of achieving deterministic guarantees of adversarial robustness in neural networks by proposing SABR, a certified training method that propagates interval bounds for a small subset of adversarial inputs to approximate worst-case loss, resulting in outperforming existing defenses in standard and certifiable accuracies across datasets and perturbation magnitudes.
To obtain, deterministic guarantees of adversarial robustness, specialized training methods are used. We propose, SABR, a novel such certified training method, based on the key insight that propagating interval bounds for a small but carefully selected subset of the adversarial input region is sufficient to approximate the worst-case loss over the whole region while significantly reducing approximation errors. We show in an extensive empirical evaluation that SABR outperforms existing certified defenses in terms of both standard and certifiable accuracies across perturbation magnitudes and datasets, pointing to a new class of certified training methods promising to alleviate the robustness-accuracy trade-off.