Spectral Learning of Binomial HMMs for DNA Methylation Data
This work addresses a computational bottleneck for researchers analyzing large-scale DNA methylation data in genomics, though it is incremental as it extends existing spectral methods to a specific model type.
The authors tackled the problem of efficiently learning parameters of Binomial Hidden Markov Models for DNA methylation data, which is computationally expensive with standard EM algorithms, by developing a new feature-map based spectral algorithm that retains computational efficiency and provides theoretical guarantees, achieving competitive performance on real data.
We consider learning parameters of Binomial Hidden Markov Models, which may be used to model DNA methylation data. The standard algorithm for the problem is EM, which is computationally expensive for sequences of the scale of the mammalian genome. Recently developed spectral algorithms can learn parameters of latent variable models via tensor decomposition, and are highly efficient for large data. However, these methods have only been applied to categorial HMMs, and the main challenge is how to extend them to Binomial HMMs while still retaining computational efficiency. We address this challenge by introducing a new feature-map based approach that exploits specific properties of Binomial HMMs. We provide theoretical performance guarantees for our algorithm and evaluate it on real DNA methylation data.