MLLGOct 31, 2018

Targeted stochastic gradient Markov chain Monte Carlo for hidden Markov models with rare latent states

arXiv:1810.13431v32 citations
Originality Incremental advance
AI Analysis

This addresses a domain-specific problem for researchers and practitioners using MCMC in hidden Markov models with imbalanced data, offering an incremental improvement over existing sub-sampling methods.

The paper tackles the problem of inaccurate inference and prediction of rare latent states in hidden Markov models when using stochastic gradient MCMC with sub-sampling, by proposing a targeted sub-sampling approach that over-samples rare state observations to reduce gradient variance. It demonstrates substantial gains in predictive and inferential accuracy on real and synthetic examples.

Markov chain Monte Carlo (MCMC) algorithms for hidden Markov models often rely on the forward-backward sampler. This makes them computationally slow as the length of the time series increases, motivating the development of sub-sampling-based approaches. These approximate the full posterior by using small random subsequences of the data at each MCMC iteration within stochastic gradient MCMC. In the presence of imbalanced data resulting from rare latent states, subsequences often exclude rare latent state data, leading to inaccurate inference and prediction/detection of rare events. We propose a targeted sub-sampling (TASS) approach that over-samples observations corresponding to rare latent states when calculating the stochastic gradient of parameters associated with them. TASS uses an initial clustering of the data to construct subsequence weights that reduce the variance in gradient estimation. This leads to improved sampling efficiency, in particular in settings where the rare latent states correspond to extreme observations. We demonstrate substantial gains in predictive and inferential accuracy on real and synthetic examples.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes