Sorting out Lipschitz function approximation
This addresses the problem of ensuring expressive power in Lipschitz-constrained networks for researchers and practitioners in adversarial robustness and generative modeling, representing a novel method rather than an incremental improvement.
The paper tackled the challenge of training neural networks under Lipschitz constraints for applications like adversarial robustness and Wasserstein distance estimation, proposing a method that combines GroupSort activation with norm-constrained weights to achieve universal approximation and showing empirically tighter Wasserstein estimates and robust guarantees with minimal accuracy loss.
Training neural networks under a strict Lipschitz constraint is useful for provable adversarial robustness, generalization bounds, interpretable gradients, and Wasserstein distance estimation. By the composition property of Lipschitz functions, it suffices to ensure that each individual affine transformation or nonlinear activation is 1-Lipschitz. The challenge is to do this while maintaining the expressive power. We identify a necessary property for such an architecture: each of the layers must preserve the gradient norm during backpropagation. Based on this, we propose to combine a gradient norm preserving activation function, GroupSort, with norm-constrained weight matrices. We show that norm-constrained GroupSort architectures are universal Lipschitz function approximators. Empirically, we show that norm-constrained GroupSort networks achieve tighter estimates of Wasserstein distance than their ReLU counterparts and can achieve provable adversarial robustness guarantees with little cost to accuracy.