MLJun 18, 2015

Dependent Multinomial Models Made Easy: Stick Breaking with the Pólya-Gamma Augmentation

arXiv:1506.05843v1107 citations
Originality Incremental advance
AI Analysis

This provides a solution for researchers and practitioners in fields like genomics and natural language processing who need to handle dependent multinomial data, though it is incremental as it builds on existing augmentation techniques.

The paper tackles the problem of modeling dependencies in multinomial data, such as DNA sequences or text, by reformulating the multinomial distribution using a logistic stick-breaking representation and Pólya-gamma augmentation, resulting in latent variables with Gaussian likelihoods that enable efficient Bayesian inference.

Many practical modeling problems involve discrete data that are best represented as draws from multinomial or categorical distributions. For example, nucleotides in a DNA sequence, children's names in a given state and year, and text documents are all commonly modeled with multinomial distributions. In all of these cases, we expect some form of dependency between the draws: the nucleotide at one position in the DNA strand may depend on the preceding nucleotides, children's names are highly correlated from year to year, and topics in text may be correlated and dynamic. These dependencies are not naturally captured by the typical Dirichlet-multinomial formulation. Here, we leverage a logistic stick-breaking representation and recent innovations in Pólya-gamma augmentation to reformulate the multinomial distribution in terms of latent variables with jointly Gaussian likelihoods, enabling us to take advantage of a host of Bayesian inference techniques for Gaussian models with minimal overhead.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes