ML LGJun 10, 2020

Banach Space Representer Theorems for Neural Networks and Ridge Splines

arXiv:2006.05626v327.133 citations

Originality Incremental advance

AI Analysis

This work provides theoretical insight into neural network regularization and generalization, but it is incremental as it builds on existing variational spline theory and focuses on specific network architectures.

The authors tackled the problem of understanding the functions learned by neural networks by developing a variational framework with total variation-like regularization, showing that finite-width, single-hidden layer neural networks are solutions to these inverse problems and that the regularizers promote desirable generalization properties.

We develop a variational framework to understand the properties of the functions learned by neural networks fit to data. We propose and study a family of continuous-domain linear inverse problems with total variation-like regularization in the Radon domain subject to data fitting constraints. We derive a representer theorem showing that finite-width, single-hidden layer neural networks are solutions to these inverse problems. We draw on many techniques from variational spline theory and so we propose the notion of polynomial ridge splines, which correspond to single-hidden layer neural networks with truncated power functions as the activation function. The representer theorem is reminiscent of the classical reproducing kernel Hilbert space representer theorem, but we show that the neural network problem is posed over a non-Hilbertian Banach space. While the learning problems are posed in the continuous-domain, similar to kernel methods, the problems can be recast as finite-dimensional neural network training problems. These neural network training problems have regularizers which are related to the well-known weight decay and path-norm regularizers. Thus, our result gives insight into functional characteristics of trained neural networks and also into the design neural network regularizers. We also show that these regularizers promote neural network solutions with desirable generalization properties.

View on arXiv PDF

Similar