Spectral Learning of Large Structured HMMs for Comparative Epigenomics
This work addresses computational bottlenecks in comparative epigenomics for researchers analyzing large chromatin datasets, though it is incremental as it builds on existing spectral methods with specific optimizations.
The authors tackled the challenge of learning parameters for large structured Hidden Markov Models (HMMs) across multiple cell types, which is computationally expensive with traditional methods, by developing a spectral algorithm that exploits tree structures to improve efficiency. They demonstrated this on biological data from nine human cell types, providing sample complexity bounds and showing applicability to other graphical models.
We develop a latent variable model and an efficient spectral algorithm motivated by the recent emergence of very large data sets of chromatin marks from multiple human cell types. A natural model for chromatin data in one cell type is a Hidden Markov Model (HMM); we model the relationship between multiple cell types by connecting their hidden states by a fixed tree of known structure. The main challenge with learning parameters of such models is that iterative methods such as EM are very slow, while naive spectral methods result in time and space complexity exponential in the number of cell types. We exploit properties of the tree structure of the hidden states to provide spectral algorithms that are more computationally efficient for current biological datasets. We provide sample complexity bounds for our algorithm and evaluate it experimentally on biological data from nine human cell types. Finally, we show that beyond our specific model, some of our algorithmic ideas can be applied to other graphical models.