LG CVOct 17, 2022

A Treatise On FST Lattice Based MMI Training

Adnan Haider, Tim Ng, Zhen Huang, Xingyu Na, Antti Veikko Rosti

arXiv:2210.08918v11.8h-index: 16

Originality Incremental advance

AI Analysis

This work addresses efficiency and accuracy issues in speech recognition training for tasks like dictation and assistants, though it is incremental as it refines an existing framework.

The paper tackles the implicit modeling decisions in finite state transducer (FST) lattice-based maximum mutual information (MMI) training for speech recognition, showing that on-the-fly determinization of denominator lattices improves discrimination. It achieves 2.3-4.6% relative word error rate reduction on Mandarin and English datasets.

Maximum mutual information (MMI) has become one of the two de facto methods for sequence-level training of speech recognition acoustic models. This paper aims to isolate, identify and bring forward the implicit modelling decisions induced by the design implementation of standard finite state transducer (FST) lattice based MMI training framework. The paper particularly investigates the necessity to maintain a preselected numerator alignment and raises the importance of determinizing FST denominator lattices on the fly. The efficacy of employing on the fly FST lattice determinization is mathematically shown to guarantee discrimination at the hypothesis level and is empirically shown through training deep CNN models on a 18K hours Mandarin dataset and on a 2.8K hours English dataset. On assistant and dictation tasks, the approach achieves between 2.3-4.6% relative WER reduction (WERR) over the standard FST lattice based approach.

View on arXiv PDF

Similar