LG MLMay 23, 2024

A Rescaling-Invariant Lipschitz Bound Based on Path-Metrics for Modern ReLU Network Parameterizations

Antoine Gonon, Nicolas Brisebarre, Elisa Riccietti, Rémi Gribonval

arXiv:2405.15006v34.62 citationsh-index: 10ICML

Originality Incremental advance

AI Analysis

This work addresses robustness guarantees for generalization, pruning, and quantization in deep learning, particularly for ReLU networks, but is incremental as it builds on prior Lipschitz bounds by incorporating symmetry-awareness.

The authors tackled the problem of existing Lipschitz bounds for ReLU networks being non-invariant to neuron-wise rescaling and limited to plain MLPs, by proving a new Lipschitz inequality based on the ℓ¹-path-metric that is rescaling-invariant and applies to diverse architectures like ResNets and CNNs, with a proof-of-concept showing pruning performance matching classical methods while being immune to rescalings.

Robustness with respect to weight perturbations underpins guarantees for generalization, pruning and quantization. Existing guarantees rely on Lipschitz bounds in parameter space, cover only plain feed-forward MLPs, and break under the ubiquitous neuron-wise rescaling symmetry of ReLU networks. We prove a new Lipschitz inequality expressed through the $\ell^1$-path-metric of the weights. The bound is (i) rescaling-invariant by construction and (ii) applies to any ReLU-DAG architecture with any combination of convolutions, skip connections, pooling, and frozen (inference-time) batch-normalization -- thus encompassing ResNets, U-Nets, VGG-style CNNs, and more. By respecting the network's natural symmetries, the new bound strictly sharpens prior parameter-space bounds and can be computed in two forward passes. To illustrate its utility, we derive from it a symmetry-aware pruning criterion and show -- through a proof-of-concept experiment on a ResNet-18 trained on ImageNet -- that its pruning performance matches that of classical magnitude pruning, while becoming totally immune to arbitrary neuron-wise rescalings.

View on arXiv PDF

Similar