Near-optimal estimates for the $\ell^p$-Lipschitz constants of deep random ReLU neural networks
This provides theoretical insights into the stability and generalization of neural networks, which is incremental but important for researchers in machine learning theory.
This paper tackles the problem of estimating the ℓ^p-Lipschitz constants for deep random ReLU neural networks with weights initialized via a He variant and symmetric biases, deriving high-probability upper and lower bounds that differ by at most a logarithmic factor in width and linear in depth, with matching bounds for shallow networks and distinct behaviors for p in [1,2) versus [2,∞].
This paper studies the $\ell^p$-Lipschitz constants of ReLU neural networks $Φ: \mathbb{R}^d \to \mathbb{R}$ with random parameters for $p \in [1,\infty]$. The distribution of the weights follows a variant of the He initialization and the biases are drawn from symmetric distributions. We derive high probability upper and lower bounds for wide networks that differ at most by a factor that is logarithmic in the network's width and linear in its depth. In the special case of shallow networks, we obtain matching bounds. Remarkably, the behavior of the $\ell^p$-Lipschitz constant varies significantly between the regimes $ p \in [1,2) $ and $ p \in [2,\infty] $. For $p \in [2,\infty]$, the $\ell^p$-Lipschitz constant behaves similarly to $\Vert g\Vert_{p'}$, where $g \in \mathbb{R}^d$ is a $d$-dimensional standard Gaussian vector and $1/p + 1/p' = 1$. In contrast, for $p \in [1,2)$, the $\ell^p$-Lipschitz constant aligns more closely to $\Vert g \Vert_{2}$.