Deep neural networks have an inbuilt Occam's razor
This addresses the fundamental problem of understanding generalization in deep learning for researchers, providing a theoretical explanation for the success of DNNs, though it is incremental in building on existing Bayesian and complexity theories.
The study investigated why overparameterized deep neural networks generalize well by applying a Bayesian framework to supervised learning, revealing that structured data combined with an intrinsic inductive bias towards simple functions is key to their success, with the analysis accurately predicting the posterior for DNNs trained with stochastic gradient descent.
The remarkable performance of overparameterized deep neural networks (DNNs) must arise from an interplay between network architecture, training algorithms, and structure in the data. To disentangle these three components, we apply a Bayesian picture, based on the functions expressed by a DNN, to supervised learning. The prior over functions is determined by the network, and is varied by exploiting a transition between ordered and chaotic regimes. For Boolean function classification, we approximate the likelihood using the error spectrum of functions on data. When combined with the prior, this accurately predicts the posterior, measured for DNNs trained with stochastic gradient descent. This analysis reveals that structured data, combined with an intrinsic Occam's razor-like inductive bias towards (Kolmogorov) simple functions that is strong enough to counteract the exponential growth of the number of functions with complexity, is a key to the success of DNNs.