Belief Net: A Filter-Based Framework for Learning Hidden Markov Models from Observations
This work addresses a fundamental problem in sequential data modeling for researchers and practitioners, offering an interpretable alternative to existing methods, though it appears incremental as it builds on prior gradient-based and neural network approaches.
The paper tackles the challenge of learning Hidden Markov Model parameters from observations by introducing Belief Net, a gradient-based framework that formulates the forward filter as a neural network, achieving faster convergence than Baum-Welch and handling settings where spectral methods fail.
Hidden Markov Models (HMMs) are fundamental for modeling sequential data, yet learning their parameters from observations remains challenging. Classical methods like the Baum-Welch (EM) algorithm are computationally intensive and prone to local optima, while modern spectral algorithms offer provable guarantees but may produce probability outputs outside valid ranges. This work introduces Belief Net, a novel framework that learns HMM parameters through gradient-based optimization by formulating the HMM's forward filter as a structured neural network. Unlike black-box Transformer models, Belief Net's learnable weights are explicitly the logits of the initial distribution, transition matrix, and emission matrix, ensuring full interpretability. The model processes observation sequences using a decoder-only architecture and is trained end-to-end with standard autoregressive next-observation prediction loss. On synthetic HMM data, Belief Net achieves superior convergence speed compared to Baum-Welch, successfully recovering parameters in both undercomplete and overcomplete settings where spectral methods fail. Comparisons with Transformer-based models are also presented on real-world language data.