LGAICRMLFeb 10, 2021

Towards Certifying L-infinity Robustness using Neural Networks with L-inf-dist Neurons

arXiv:2102.05363v462 citations
Originality Highly original
AI Analysis

This addresses the critical issue of adversarial robustness in neural networks for security-sensitive applications, offering a novel theoretical foundation rather than an incremental improvement.

The paper tackles the problem of neural networks being vulnerable to adversarial perturbations by designing a new type of neuron based on L-infinity distance, which inherently provides certified robustness guarantees. The result includes state-of-the-art certified accuracies, such as 93.09% on MNIST with ε=0.3.

It is well-known that standard neural networks, even with a high classification accuracy, are vulnerable to small $\ell_\infty$-norm bounded adversarial perturbations. Although many attempts have been made, most previous works either can only provide empirical verification of the defense to a particular attack method, or can only develop a certified guarantee of the model robustness in limited scenarios. In this paper, we seek for a new approach to develop a theoretically principled neural network that inherently resists $\ell_\infty$ perturbations. In particular, we design a novel neuron that uses $\ell_\infty$-distance as its basic operation (which we call $\ell_\infty$-dist neuron), and show that any neural network constructed with $\ell_\infty$-dist neurons (called $\ell_{\infty}$-dist net) is naturally a 1-Lipschitz function with respect to $\ell_\infty$-norm. This directly provides a rigorous guarantee of the certified robustness based on the margin of prediction outputs. We then prove that such networks have enough expressive power to approximate any 1-Lipschitz function with robust generalization guarantee. We further provide a holistic training strategy that can greatly alleviate optimization difficulties. Experimental results show that using $\ell_{\infty}$-dist nets as basic building blocks, we consistently achieve state-of-the-art performance on commonly used datasets: 93.09% certified accuracy on MNIST ($ε=0.3$), 35.42% on CIFAR-10 ($ε=8/255$) and 16.31% on TinyImageNet ($ε=1/255$).

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes