An Instability in Variational Inference for Topic Models
This addresses a critical reliability issue in Bayesian inference for topic models, which are widely used in text and image analysis, though it is incremental as it focuses on a specific instability within existing methods.
The paper identifies an instability in variational inference for topic models, showing that in certain parameter regimes the algorithm outputs a non-trivial decomposition into topics even when the data contain no actual information about the true decomposition, leading to significantly wrong posterior estimates and poor credible region coverage.
Topic models are Bayesian models that are frequently used to capture the latent structure of certain corpora of documents or images. Each data element in such a corpus (for instance each item in a collection of scientific articles) is regarded as a convex combination of a small number of vectors corresponding to `topics' or `components'. The weights are assumed to have a Dirichlet prior distribution. The standard approach towards approximating the posterior is to use variational inference algorithms, and in particular a mean field approximation. We show that this approach suffers from an instability that can produce misleading conclusions. Namely, for certain regimes of the model parameters, variational inference outputs a non-trivial decomposition into topics. However --for the same parameter values-- the data contain no actual information about the true decomposition, and hence the output of the algorithm is uncorrelated with the true topic decomposition. Among other consequences, the estimated posterior mean is significantly wrong, and estimated Bayesian credible regions do not achieve the nominal coverage. We discuss how this instability is remedied by more accurate mean field approximations.