MLLGCOJun 22, 2012

Hidden Markov Models with mixtures as emission distributions

arXiv:1206.5102v154 citations
Originality Incremental advance
AI Analysis

This work addresses the need for more adaptable HMMs in fields like biology, though it is incremental as it builds on existing parametric methods.

The paper tackled the problem of improving flexibility in Hidden Markov Models for unsupervised classification by proposing semiparametric emission distributions as mixtures, and it showed that the adapted EM algorithm with hierarchical initialization and model selection criteria achieved accurate classification in simulations and a biological dataset.

In unsupervised classification, Hidden Markov Models (HMM) are used to account for a neighborhood structure between observations. The emission distributions are often supposed to belong to some parametric family. In this paper, a semiparametric modeling where the emission distributions are a mixture of parametric distributions is proposed to get a higher flexibility. We show that the classical EM algorithm can be adapted to infer the model parameters. For the initialisation step, starting from a large number of components, a hierarchical method to combine them into the hidden states is proposed. Three likelihood-based criteria to select the components to be combined are discussed. To estimate the number of hidden states, BIC-like criteria are derived. A simulation study is carried out both to determine the best combination between the merging criteria and the model selection criteria and to evaluate the accuracy of classification. The proposed method is also illustrated using a biological dataset from the model plant Arabidopsis thaliana. A R package HMMmix is freely available on the CRAN.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes