Jonathan Huggins

LG
4papers
64citations
Novelty44%
AI Score22

4 Papers

LGMar 1, 2021
Challenges and Opportunities in High-dimensional Variational Inference

Akash Kumar Dhaka, Alejandro Catalina, Manushi Welandawe et al.

Current black-box variational inference (BBVI) methods require the user to make numerous design choices -- such as the selection of variational objective and approximating family -- yet there is little principled guidance on how to do so. We develop a conceptual framework and set of experimental tools to understand the effects of these choices, which we leverage to propose best practices for maximizing posterior approximation accuracy. Our approach is based on studying the pre-asymptotic tail behavior of the density ratios between the joint distribution and the variational approximation, then exploiting insights and tools from the importance sampling literature. Our framework and supporting experiments help to distinguish between the behavior of BBVI methods for approximating low-dimensional versus moderate-to-high-dimensional posteriors. In the latter case, we show that mass-covering variational objectives are difficult to optimize and do not improve accuracy, but flexible variational families can improve accuracy and the effectiveness of importance sampling -- at the cost of additional optimization challenges. Therefore, for moderate-to-high-dimensional posteriors we recommend using the (mode-seeking) exclusive KL divergence since it is the easiest to optimize, and improving the variational family or using model parameter transformations to make the posterior and optimal variational approximation more similar. On the other hand, in low-dimensional settings, we show that heavy-tailed variational families and mass-covering divergences are effective and can increase the chances that the approximation can be improved by importance sampling.

MESep 22, 2020
Independent finite approximations for Bayesian nonparametric inference

Tin D. Nguyen, Jonathan Huggins, Lorenzo Masoero et al.

Completely random measures (CRMs) and their normalizations (NCRMs) offer flexible models in Bayesian nonparametrics. But their infinite dimensionality presents challenges for inference. Two popular finite approximations are truncated finite approximations (TFAs) and independent finite approximations (IFAs). While the former have been well-studied, IFAs lack similarly general bounds on approximation error, and there has been no systematic comparison between the two options. In the present work, we propose a general recipe to construct practical finite-dimensional approximations for homogeneous CRMs and NCRMs, in the presence or absence of power laws. We call our construction the automated independent finite approximation (AIFA). Relative to TFAs, we show that AIFAs facilitate more straightforward derivations and use of parallel computing in approximate inference. We upper bound the approximation error of AIFAs for a wide class of common CRMs and NCRMs -- and thereby develop guidelines for choosing the approximation level. Our lower bounds in key cases suggest that our upper bounds are tight. We prove that, for worst-case choices of observation likelihoods, TFAs are more efficient than AIFAs. Conversely, we find that in real-data experiments with standard likelihoods, AIFAs and TFAs perform similarly. Moreover, we demonstrate that AIFAs can be used for hyperparameter estimation even when other potential IFA options struggle or do not apply.

QMNov 28, 2018
Reconstructing probabilistic trees of cellular differentiation from single-cell RNA-seq data

Miriam Shiffman, William T. Stephenson, Geoffrey Schiebinger et al.

Until recently, transcriptomics was limited to bulk RNA sequencing, obscuring the underlying expression patterns of individual cells in favor of a global average. Thanks to technological advances, we can now profile gene expression across thousands or millions of individual cells in parallel. This new type of data has led to the intriguing discovery that individual cell profiles can reflect the imprint of time or dynamic processes. However, synthesizing this information to reconstruct dynamic biological phenomena from data that are noisy, heterogenous, and sparse---and from processes that may unfold asynchronously---poses a complex computational and statistical challenge. Here, we develop a full generative model for probabilistically reconstructing trees of cellular differentiation from single-cell RNA-seq data. Specifically, we extend the framework of the classical Dirichlet diffusion tree to simultaneously infer branch topology and latent cell states along continuous trajectories over the full tree. In tandem, we construct a novel Markov chain Monte Carlo sampler that interleaves Metropolis-Hastings and message passing to leverage model structure for efficient inference. Finally, we demonstrate that these techniques can recover latent trajectories from simulated single-cell transcriptomes. While this work is motivated by cellular differentiation, we derive a tractable model that provides flexible densities for any data (coupled with an appropriate noise model) that arise from continuous evolution along a latent nonparametric tree.

MLJun 23, 2016
Fast robustness quantification with variational Bayes

Ryan Giordano, Tamara Broderick, Rachael Meager et al.

Bayesian hierarchical models are increasing popular in economics. When using hierarchical models, it is useful not only to calculate posterior expectations, but also to measure the robustness of these expectations to reasonable alternative prior choices. We use variational Bayes and linear response methods to provide fast, accurate posterior means and robustness measures with an application to measuring the effectiveness of microcredit in the developing world.