LGAIMLFeb 6, 2024

The VampPrior Mixture Model

arXiv:2402.04412v33 citationsh-index: 6AISTATS
AI Analysis

This work addresses the need for better clustering and biological interpretation in deep learning models for researchers in machine learning and computational biology, though it is incremental as it builds on existing VampPrior concepts.

The paper tackled the problem of simplistic priors in deep latent variable models like VAEs by proposing the VampPrior Mixture Model (VMM), a Bayesian GMM prior, which achieved highly competitive clustering performance on benchmark datasets and significantly improved scRNA-seq integration in scVI by automatically arranging cells into biologically similar clusters.

Widely used deep latent variable models (DLVMs), in particular Variational Autoencoders (VAEs), employ overly simplistic priors on the latent space. To achieve strong clustering performance, existing methods that replace the standard normal prior with a Gaussian mixture model (GMM) require defining the number of clusters to be close to the number of expected ground truth classes a-priori and are susceptible to poor initializations. We leverage VampPrior concepts (Tomczak and Welling, 2018) to fit a Bayesian GMM prior, resulting in the VampPrior Mixture Model (VMM), a novel prior for DLVMs. In a VAE, the VMM attains highly competitive clustering performance on benchmark datasets. Integrating the VMM into scVI (Lopez et al., 2018), a popular scRNA-seq integration method, significantly improves its performance and automatically arranges cells into clusters with similar biological characteristics.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes