Dropout as a Bayesian Approximation: Appendix
This work provides a theoretical foundation for dropout, allowing Bayesian methods to be integrated into deep learning frameworks, benefiting researchers and practitioners in machine learning.
The paper demonstrates that applying dropout before every weight layer in a neural network is mathematically equivalent to a Bayesian approximation, which explains dropout's robustness to over-fitting and enables uncertainty reasoning in deep learning.
We show that a neural network with arbitrary depth and non-linearities, with dropout applied before every weight layer, is mathematically equivalent to an approximation to a well known Bayesian model. This interpretation might offer an explanation to some of dropout's key properties, such as its robustness to over-fitting. Our interpretation allows us to reason about uncertainty in deep learning, and allows the introduction of the Bayesian machinery into existing deep learning frameworks in a principled way. This document is an appendix for the main paper "Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning" by Gal and Ghahramani, 2015.