A Comparison of Lattice-free Discriminative Training Criteria for Purely Sequence-Trained Neural Network Acoustic Models
This work provides incremental improvements to training methods for speech recognition systems.
The paper compared three lattice-free discriminative training criteria (MMI, bMMI, sMBR) for neural network acoustic models in LVCSR tasks, showing that LF-bMMI outperforms LF-MMI with a 5% relative WER reduction on Switchboard datasets.
In this work, three lattice-free (LF) discriminative training criteria for purely sequence-trained neural network acoustic models are compared on LVCSR tasks, namely maximum mutual information (MMI), boosted maximum mutual information (bMMI) and state-level minimum Bayes risk (sMBR). We demonstrate that, analogous to LF-MMI, a neural network acoustic model can also be trained from scratch using LF-bMMI or LF-sMBR criteria respectively without the need of cross-entropy pre-training. Furthermore, experimental results on Switchboard-300hrs and Switchboard+Fisher-2100hrs datasets show that models trained with LF-bMMI consistently outperform those trained with plain LF-MMI and achieve a relative word error rate (WER) reduction of 5% over competitive temporal convolution projected LSTM (TDNN-LSTMP) LF-MMI baselines.