SDASApr 6, 2021

Towards Consistent Hybrid HMM Acoustic Modeling

arXiv:2104.02387v35 citations
Originality Incremental advance
AI Analysis

This work simplifies ASR training for researchers and practitioners by eliminating the need for complex clustering pipelines, though it is incremental as it builds on existing hybrid models.

The authors tackled the complexity of training hybrid ASR systems by proposing a flat-start factored hybrid model that explicitly models all triphone states without clustering, simplifying the training pipeline. Their models achieved competitive performance on the Switchboard task compared to existing clustered and flat-start methods.

High-performance hybrid automatic speech recognition (ASR) systems are often trained with clustered triphone outputs, and thus require a complex training pipeline to generate the clustering. The same complex pipeline is often utilized in order to generate an alignment for use in frame-wise cross-entropy training. In this work, we propose a flat-start factored hybrid model trained by modeling the full set of triphone states explicitly without relying on clustering methods. This greatly simplifies the training of new models. Furthermore, we study the effect of different alignments used for Viterbi training. Our proposed models achieve competitive performance on the Switchboard task compared to systems using clustered triphones and other flat-start models in the literature.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes