Adversarial Training and Provable Robustness: A Tale of Two Objectives
This addresses the challenge of ensuring neural network robustness against adversarial attacks for security-critical applications, representing an incremental improvement over prior methods.
The paper tackles the problem of training certifiably robust neural networks by combining adversarial training and provable robustness verification, achieving results such as 6.60% verified test error on MNIST at epsilon = 0.3 and 66.57% on CIFAR-10 with epsilon = 8/255.
We propose a principled framework that combines adversarial training and provable robustness verification for training certifiably robust neural networks. We formulate the training problem as a joint optimization problem with both empirical and provable robustness objectives and develop a novel gradient-descent technique that can eliminate bias in stochastic multi-gradients. We perform both theoretical analysis on the convergence of the proposed technique and experimental comparison with state-of-the-arts. Results on MNIST and CIFAR-10 show that our method can consistently match or outperform prior approaches for provable l infinity robustness. Notably, we achieve 6.60% verified test error on MNIST at epsilon = 0.3, and 66.57% on CIFAR-10 with epsilon = 8/255.