LG AI OC MLMay 31, 2018

Scaling provable adversarial defenses

Eric Wong, Frank R. Schmidt, Jan Hendrik Metzen, J. Zico Kolter

arXiv:1805.12514v238.4470 citationsHas Code

Originality Highly original

AI Analysis

This work addresses the problem of scaling provable adversarial defenses for deep learning classifiers, which is incremental but offers strong specific gains for researchers and practitioners in adversarial machine learning.

The paper tackled scaling provable adversarial defenses to larger models by extending training procedures to general networks with skip connections and nonlinearities, adopting a nonlinear random projection for efficient training, and using cascade models to improve robust error. The result was substantial improvements in provable robust adversarial error bounds, reducing from 5.8% to 3.1% on MNIST and from 80% to 36.4% on CIFAR under specific perturbation conditions.

Recent work has developed methods for learning deep network classifiers that are provably robust to norm-bounded adversarial perturbation; however, these methods are currently only possible for relatively small feedforward networks. In this paper, in an effort to scale these approaches to substantially larger models, we extend previous work in three main directions. First, we present a technique for extending these training procedures to much more general networks, with skip connections (such as ResNets) and general nonlinearities; the approach is fully modular, and can be implemented automatically (analogous to automatic differentiation). Second, in the specific case of $\ell_\infty$ adversarial perturbations and networks with ReLU nonlinearities, we adopt a nonlinear random projection for training, which scales linearly in the number of hidden units (previous approaches scaled quadratically). Third, we show how to further improve robust error through cascade models. On both MNIST and CIFAR data sets, we train classifiers that improve substantially on the state of the art in provable robust adversarial error bounds: from 5.8% to 3.1% on MNIST (with $\ell_\infty$ perturbations of $ε=0.1$), and from 80% to 36.4% on CIFAR (with $\ell_\infty$ perturbations of $ε=2/255$). Code for all experiments in the paper is available at https://github.com/locuslab/convex_adversarial/.

View on arXiv PDF Code

Similar