Moving Toward High Precision Dynamical Modelling in Hidden Markov Models
This work addresses a bottleneck in speech recognition systems by enabling more precise temporal modelling, though it is incremental as it builds on existing HMM frameworks.
The paper tackles the problem of limited topology options in Hidden Markov Models (HMMs) for speech recognition, proposing a framework to learn efficient topologies by pruning complex models, which results in better learning of complex time dependencies compared to classical left-to-right models.
Hidden Markov Model (HMM) is often regarded as the dynamical model of choice in many fields and applications. It is also at the heart of most state-of-the-art speech recognition systems since the 70's. However, from Gaussian mixture models HMMs (GMM-HMM) to deep neural network HMMs (DNN-HMM), the underlying Markovian chain of state-of-the-art models did not changed much. The "left-to-right" topology is mostly always employed because very few other alternatives exist. In this paper, we propose that finely-tuned HMM topologies are essential for precise temporal modelling and that this approach should be investigated in state-of-the-art HMM system. As such, we propose a proof-of-concept framework for learning efficient topologies by pruning down complex generic models. Speech recognition experiments that were conducted indicate that complex time dependencies can be better learned by this approach than with classical "left-to-right" models.