Variational Dropout via Empirical Bayes
This work addresses theoretical inconsistencies in variational dropout for researchers in Bayesian deep learning, offering a more robust framework, though it is incremental as it builds on existing methods.
The paper tackles the theoretical issues of variational dropout by showing that Automatic Relevance Determination (ARD) in Bayesian deep neural networks yields a variational bound similar to variational dropout, providing an alternative Bayesian interpretation. Experimental results indicate that both approaches achieve comparable performance, with hierarchical priors in ARD enabling higher sparsity without sacrificing accuracy.
We study the Automatic Relevance Determination procedure applied to deep neural networks. We show that ARD applied to Bayesian DNNs with Gaussian approximate posterior distributions leads to a variational bound similar to that of variational dropout, and in the case of a fixed dropout rate, objectives are exactly the same. Experimental results show that the two approaches yield comparable results in practice even when the dropout rates are trained. This leads to an alternative Bayesian interpretation of dropout and mitigates some of the theoretical issues that arise with the use of improper priors in the variational dropout model. Additionally, we explore the use of the hierarchical priors in ARD and show that it helps achieve higher sparsity for the same accuracy.