LGMLJan 17, 2020

Deep Neural Networks with Trainable Activations and Controlled Lipschitz Constant

arXiv:2001.06263v245 citations
Originality Incremental advance
AI Analysis

This work addresses the need for more expressive and stable neural networks, particularly in applications requiring controlled smoothness, though it is incremental as it builds on existing activation function research.

The authors tackled the problem of increasing neural network capacity while controlling the Lipschitz constant by introducing a variational framework to learn activation functions, resulting in a method that reduces to finite-dimensional minimization with sparse nonlinearities and shows empirical improvements over standard ReLU variants.

We introduce a variational framework to learn the activation functions of deep neural networks. Our aim is to increase the capacity of the network while controlling an upper-bound of the actual Lipschitz constant of the input-output relation. To that end, we first establish a global bound for the Lipschitz constant of neural networks. Based on the obtained bound, we then formulate a variational problem for learning activation functions. Our variational problem is infinite-dimensional and is not computationally tractable. However, we prove that there always exists a solution that has continuous and piecewise-linear (linear-spline) activations. This reduces the original problem to a finite-dimensional minimization where an l1 penalty on the parameters of the activations favors the learning of sparse nonlinearities. We numerically compare our scheme with standard ReLU network and its variations, PReLU and LeakyReLU and we empirically demonstrate the practical aspects of our framework.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes