Manifold Regularization for Locally Stable Deep Neural Networks
This work addresses the critical issue of adversarial robustness for deep learning models, offering an incremental improvement with efficient regularization techniques.
The paper tackles the problem of training locally stable deep neural networks against various perturbations by applying manifold regularization concepts, achieving 40% adversarial accuracy on CIFAR-10 against adaptive PGD attacks and state-of-the-art verified accuracy of 21%.
We apply concepts from manifold regularization to develop new regularization techniques for training locally stable deep neural networks. Our regularizers are based on a sparsification of the graph Laplacian which holds with high probability when the data is sparse in high dimensions, as is common in deep learning. Empirically, our networks exhibit stability in a diverse set of perturbation models, including $\ell_2$, $\ell_\infty$, and Wasserstein-based perturbations; in particular, we achieve 40% adversarial accuracy on CIFAR-10 against an adaptive PGD attack using $\ell_\infty$ perturbations of size $ε= 8/255$, and state-of-the-art verified accuracy of 21% in the same perturbation model. Furthermore, our techniques are efficient, incurring overhead on par with two additional parallel forward passes through the network.