ML LGFeb 16, 2021

A Law of Robustness for Weight-bounded Neural Networks

arXiv:2102.08093v23.62 citations

Originality Incremental advance

AI Analysis

This work addresses the pressing concern of adversarial robustness in deep learning, providing foundational theoretical insights that are incremental but formalize existing conjectures.

The paper tackles the problem of characterizing the robustness of neural networks against adversarial perturbations by deriving a lower bound on the Lipschitz constant for model classes with bounded Rademacher complexity, showing that one requires log n constant-sized layers to robustly fit data, which formalizes the necessity of over-parametrization in deep learning.

Robustness of deep neural networks against adversarial perturbations is a pressing concern motivated by recent findings showing the pervasive nature of such vulnerabilities. One method of characterizing the robustness of a neural network model is through its Lipschitz constant, which forms a robustness certificate. A natural question to ask is, for a fixed model class (such as neural networks) and a dataset of size $n$, what is the smallest achievable Lipschitz constant among all models that fit the dataset? Recently, (Bubeck et al., 2020) conjectured that when using two-layer networks with $k$ neurons to fit a generic dataset, the smallest Lipschitz constant is $Ω(\sqrt{\frac{n}{k}})$. This implies that one would require one neuron per data point to robustly fit the data. In this work we derive a lower bound on the Lipschitz constant for any arbitrary model class with bounded Rademacher complexity. Our result coincides with that conjectured in (Bubeck et al., 2020) for two-layer networks under the assumption of bounded weights. However, due to our result's generality, we also derive bounds for multi-layer neural networks, discovering that one requires $\log n$ constant-sized layers to robustly fit the data. Thus, our work establishes a law of robustness for weight bounded neural networks and provides formal evidence on the necessity of over-parametrization in deep learning.

View on arXiv PDF

Similar