ML LGOct 17, 2019

Why bigger is not always better: on finite and infinite neural networks

arXiv:1910.08013v319.561 citations

Originality Incremental advance

AI Analysis

This addresses a key limitation in neural network theory for researchers and practitioners, offering a more flexible model that bridges the gap between infinite and finite networks, though it is incremental in proposing a new class based on existing concepts.

The paper tackles the problem that infinite Bayesian neural networks lack representation learning, leading to worse performance compared to finite networks, and shows empirically that SOTA architectures like ResNets align more with finite network representations, motivating the introduction of infinite networks with bottlenecks to combine theoretical tractability with representation learning.

Recent work has argued that neural networks can be understood theoretically by taking the number of channels to infinity, at which point the outputs become Gaussian process (GP) distributed. However, we note that infinite Bayesian neural networks lack a key facet of the behaviour of real neural networks: the fixed kernel, determined only by network hyperparameters, implies that they cannot do any form of representation learning. The lack of representation or equivalently kernel learning leads to less flexibility and hence worse performance, giving a potential explanation for the inferior performance of infinite networks observed in the literature (e.g. Novak et al. 2019). We give analytic results characterising the prior over representations and representation learning in finite deep linear networks. We show empirically that the representations in SOTA architectures such as ResNets trained with SGD are much closer to those suggested by our deep linear results than by the corresponding infinite network. This motivates the introduction of a new class of network: infinite networks with bottlenecks, which inherit the theoretical tractability of infinite networks while at the same time allowing representation learning.

View on arXiv PDF

Similar