MLLGJun 17, 2021

Wide stochastic networks: Gaussian limit and PAC-Bayesian training

arXiv:2106.09798v312 citations
AI Analysis

This work addresses the challenge of improving generalization in stochastic neural networks for machine learning practitioners, but it is incremental as it extends existing infinite-width limit results to a stochastic architecture.

The paper tackles the problem of analyzing and training over-parameterized stochastic neural networks by establishing a Gaussian limit for infinite-width networks, both before and during training, and uses this to develop a PAC-Bayesian training procedure that directly optimizes generalization bounds. The result shows that this approach empirically outperforms standard PAC-Bayesian methods on MNIST, though specific numerical gains are not detailed.

The limit of infinite width allows for substantial simplifications in the analytical study of over-parameterised neural networks. With a suitable random initialisation, an extremely large network exhibits an approximately Gaussian behaviour. In the present work, we establish a similar result for a simple stochastic architecture whose parameters are random variables, holding both before and during training. The explicit evaluation of the output distribution allows for a PAC-Bayesian training procedure that directly optimises the generalisation bound. For a large but finite-width network, we show empirically on MNIST that this training approach can outperform standard PAC-Bayesian methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes