MLLGOct 9, 2018

Dropout as a Structured Shrinkage Prior

arXiv:1810.04045v329 citations
AI Analysis

This work provides a theoretical foundation for dropout, which is incremental but clarifies its role as a Bayesian prior, benefiting researchers in machine learning and neural network regularization.

The paper tackles the problem of understanding dropout regularization in deep neural networks by showing that multiplicative noise induces structured shrinkage priors on weights, and it proposes a novel shrinkage framework for resnets with automatic depth determination, achieving improvements in regression benchmarks.

Dropout regularization of deep neural networks has been a mysterious yet effective tool to prevent overfitting. Explanations for its success range from the prevention of "co-adapted" weights to it being a form of cheap Bayesian inference. We propose a novel framework for understanding multiplicative noise in neural networks, considering continuous distributions as well as Bernoulli noise (i.e. dropout). We show that multiplicative noise induces structured shrinkage priors on a network's weights. We derive the equivalence through reparametrization properties of scale mixtures and without invoking any approximations. Given the equivalence, we then show that dropout's Monte Carlo training objective approximates marginal MAP estimation. We leverage these insights to propose a novel shrinkage framework for resnets, terming the prior 'automatic depth determination' as it is the natural analog of automatic relevance determination for network depth. Lastly, we investigate two inference strategies that improve upon the aforementioned MAP approximation in regression benchmarks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes