Deep Learning: A Bayesian Perspective
This work addresses the need for better optimization and regularization in deep learning, which is incremental as it builds on existing methods like SGD and Dropout by integrating Bayesian techniques.
The paper tackles the problem of improving deep learning algorithms for high-dimensional pattern matching by adopting a Bayesian perspective, resulting in insights for more efficient optimization and hyper-parameter tuning, with predictive performance gains demonstrated in an analysis of Airbnb bookings.
Deep learning is a form of machine learning for nonlinear high dimensional pattern matching and prediction. By taking a Bayesian probabilistic perspective, we provide a number of insights into more efficient algorithms for optimisation and hyper-parameter tuning. Traditional high-dimensional data reduction techniques, such as principal component analysis (PCA), partial least squares (PLS), reduced rank regression (RRR), projection pursuit regression (PPR) are all shown to be shallow learners. Their deep learning counterparts exploit multiple deep layers of data reduction which provide predictive performance gains. Stochastic gradient descent (SGD) training optimisation and Dropout (DO) regularization provide estimation and variable selection. Bayesian regularization is central to finding weights and connections in networks to optimize the predictive bias-variance trade-off. To illustrate our methodology, we provide an analysis of international bookings on Airbnb. Finally, we conclude with directions for future research.