ST LG MLOct 14, 2020

Flexible mean field variational inference using mixtures of non-overlapping exponential families

arXiv:2010.06768v11.2h-index: 23Has Code

Originality Incremental advance

AI Analysis

This provides a theoretical fix for variational inference in sparse Bayesian models, which is incremental but useful for practitioners in fields like genetics and dimensionality reduction.

The paper addresses the failure of standard mean field variational inference in models with sparsity-inducing priors like spike-and-slab, showing that mixtures of non-overlapping exponential families form an exponential family to remedy this, with applications in statistical genetics and sparse probabilistic PCA.

Sparse models are desirable for many applications across diverse domains as they can perform automatic variable selection, aid interpretability, and provide regularization. When fitting sparse models in a Bayesian framework, however, analytically obtaining a posterior distribution over the parameters of interest is intractable for all but the simplest cases. As a result practitioners must rely on either sampling algorithms such as Markov chain Monte Carlo or variational methods to obtain an approximate posterior. Mean field variational inference is a particularly simple and popular framework that is often amenable to analytically deriving closed-form parameter updates. When all distributions in the model are members of exponential families and are conditionally conjugate, optimization schemes can often be derived by hand. Yet, I show that using standard mean field variational inference can fail to produce sensible results for models with sparsity-inducing priors, such as the spike-and-slab. Fortunately, such pathological behavior can be remedied as I show that mixtures of exponential family distributions with non-overlapping support form an exponential family. In particular, any mixture of a diffuse exponential family and a point mass at zero to model sparsity forms an exponential family. Furthermore, specific choices of these distributions maintain conditional conjugacy. I use two applications to motivate these results: one from statistical genetics that has connections to generalized least squares with a spike-and-slab prior on the regression coefficients; and sparse probabilistic principal component analysis. The theoretical results presented here are broadly applicable beyond these two examples.

View on arXiv PDF Code

Similar